On Tue, 2016-02-09 at 09:04 +0000, Agblad Tore wrote: > Hi, just a long shot perhaps, checking if anyone else have had this problem > and solved it: > > Two Linux servers running TSM(backup server) software, one x86 and on s390x > (RHEL 6.7) > > Both have an extra NIC connected to a separate backup LAN, separate OSA > adapters + separate cables all the way. > > Connection test method: telnet ipaddress 1500 > > Connect from s390x via that vlan and port 1500 into the x86 server works, but > takes 5-10 seconds first time. And again 5-10 seconds if no traffic for about > 5 minutes. > > Connect from x86 same method does not work, unless a connection from s390x > was made the last 5 minutes. > > It seems the arp cache in the x86 server is updated and that server finds it > way after that, but timeout is 5 minutes, so after that it does not find it's > way. > > I took a tcpdump on that backup lan interface, got some help understanding it > using wireshark, and obviously the x86 connection tries does initiate an arp > query to the s390x server that also replies with its mac address, as it > should. Still this does not help. > Route tables also looks as expected, should not cause a problem. > > Anyone having seen this problem ? > If you also have the solution you will make my day :-) > > BR /Tore
Your description suggests the networking is specific to this application. You did not indicate if this is a new implementation but your test method suggests it might be. I have in the past seen similar situations. Do not ignore intervening network equipment, switch/router. They have port configurations and similar state information related to forwarding that may be the real culprit. Routing issues tend to be hard failures, unless you are dealing with flaky connections. The symptoms suggest to me a switch issue in a switch adjacent to the one or both of the servers. That is what I would be exploring if I was trying to troubleshoot the issue. The additional connectivity methods suggested by the other poster might expose where the communication is failing. In large organizations the networking and server teams, particularly when mainframes are involved, make assumptions about what the other is doing that do not always mesh. Getting your networking staff involved may be helpful here. They can look at connectivity failures, if any, and the forwarding tables related to MAC addresses. A duplicate MAC address might be part of the problem. Only the networking team can look at these factors. Just suggestions for investigation, Harold Grovesteen ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
