Hi Iban,
I think it's necessary more information to be able to help you. Here they are:
- Redhat version: Which is 7.2, 7.3 or 7.4?
- Redhat kernel version: In the FAQ of GPFS has the recommended kernel levels
- Platform: Is it x86_64?
- Is there a reason for you stay in 4.2.3-6? Could you update to 4.2.3-9 or 5.0.1?
- How is the name resolution? Can you do test ping from one node to another and it's reverse?
- TCP/IP tuning: What is the TCP/IP parameters you are using? I have used for 7.4 the following:
[root@XXXX sysctl.d]# cat 99-ibmscale.conf
net.core.somaxconn = 10000
net.core.netdev_max_backlog = 250000
net.ipv4.ip_local_port_range = 2000 65535
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_fin_timeout = 10
net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_default = 4194304
net.core.wmem_max = 4194304
net.core.optmem_max = 4194304
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
vm.min_free_kbytes = 512000
kernel.panic_on_oops = 0
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
vm.swappiness = 0
vm.dirty_ratio = 10
net.core.somaxconn = 10000
net.core.netdev_max_backlog = 250000
net.ipv4.ip_local_port_range = 2000 65535
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_fin_timeout = 10
net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_default = 4194304
net.core.wmem_max = 4194304
net.core.optmem_max = 4194304
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
vm.min_free_kbytes = 512000
kernel.panic_on_oops = 0
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
vm.swappiness = 0
vm.dirty_ratio = 10
Since we disabled ipv6, we had to rebuild the kernel image with the following command:
[root@XXXX ~]# dracut -f -v
- GPFS tuning parameters: Can you list them?
- Spectrum Scale status: Can you send the following outputs:
mmgetstate -a -L
mmlscluster
mmhealth cluster show
mmhealth cluster show --verbose
mmhealth node eventlog
mmlsnode -L -N waiters
| Abraços / Regards / Saludos,
Anderson Nobre AIX & Power Consultant Master Certified IT Specialist IBM Systems Hardware Client Technical Team – IBM Systems Lab Services |
| | ||
| Phone: 55-19-2132-4317 E-mail: [email protected] | ||
----- Original message -----
From: Iban Cabrillo <[email protected]>
Sent by: [email protected]
To: [email protected]
Cc:
Subject: [gpfsug-discuss] Thousands of CLOSE_WAIT connections
Date: Fri, Jun 15, 2018 9:12 AM
Dear,We have reinstall recently from gpfs 3.5 to SpectrumScale 4.2.3-6 version redhat 7.We are running two nsd servers and a a gui, there is no firewall on gpfs network, and selinux is disable, I have checked changing the manager and cluster manager node between server with the same result, server 01 always increase the CLOSE_WAIT :Node Daemon node name IP address Admin node name Designation
--------------------------------------------------------------------------------
1 gpfs01.ifca.es 10.10.0.111 gpfs01.ifca.es quorum-manager-perfmon
2 gpfs02.ifca.es 10.10.0.112 gpfs02.ifca.es quorum-manager-perfmon
3 gpfsgui.ifca.es 10.10.0.60 gpfsgui.ifca.es quorum-perfmon
.......Installation and configuration works fine, but now we see that one of the servers do not close the mmfsd connections and this growing for ever while the othe nsd servers is always in the same range:[root@gpfs01 ~]# netstat -putana | grep 1191 | wc -l
19701[root@gpfs01 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l
19528....[root@gpfs02 ~]# netstat -putana | grep 1191 | wc -l
215
[root@gpfs02 ~]# netstat -putana | grep 1191 | grep CLOSE_WAIT| wc -l
0this is causing that gpfs01 do not answer to cluster commandsNSD are balance between server (same size):[root@gpfs02 ~]# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
gpfs nsd1 gpfs01,gpfs02
gpfs nsd2 gpfs01,gpfs02
gpfs nsd3 gpfs02,gpfs01
gpfs nsd4 gpfs02,gpfs01
.....proccess seems to be similar in both servers, only mmccr is running on server 1 and not in 2gpfs01#######root 9169 1 0 feb07 ? 22:27:54 python /usr/lpp/mmfs/bin/mmsysmon.py
root 11533 6154 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_fs_info all
root 11713 1 0 13:41 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 12367 11533 0 13:43 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr vget mmRunningCommand
root 12641 6162 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmsdrquery sdrq_nsd_info sdrq_nsd_name:sdrq_fs_name:sdrq_storage_pool
root 12668 12641 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr fget -c 835 mmsdrfs /var/mmfs/gen/mmsdrfs.12641
root 12950 11713 0 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 12959 9169 13 13:44 ? 00:00:00 /usr/lpp/mmfs/bin/mmccr check -Y -e
root 12968 3150 0 13:45 pts/3 00:00:00 grep --color=auto mm
root 19620 26468 38 jun14 ? 11:28:36 /usr/lpp/mmfs/bin/mmfsd
root 19701 2 0 jun14 ? 00:00:00 [mmkproc]
root 19702 2 0 jun14 ? 00:00:00 [mmkproc]
root 19703 2 0 jun14 ? 00:00:00 [mmkproc]
root 26468 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs[root@gpfs02 ~]# ps -feA | grep mm
root 5074 1 0 feb07 ? 01:00:34 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 5128 31456 28 jun14 ? 06:18:07 /usr/lpp/mmfs/bin/mmfsd
root 5255 2 0 jun14 ? 00:00:00 [mmkproc]
root 5256 2 0 jun14 ? 00:00:00 [mmkproc]
root 5257 2 0 jun14 ? 00:00:00 [mmkproc]
root 15196 5074 0 13:47 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root 15265 13117 0 13:47 pts/0 00:00:00 grep --color=auto mm
root 31456 1 0 jun05 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfsAny idea will be appreciated.Regards, I_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
