On Tue, 2016-02-09 at 09:04 +0000, Agblad Tore wrote:
> Hi, just a long shot perhaps, checking if anyone else have had this problem 
> and solved it:
>
> Two Linux servers running TSM(backup server) software, one x86 and on s390x 
> (RHEL 6.7)
>
> Both have an extra NIC connected to a separate backup LAN, separate OSA 
> adapters + separate cables all the way.
>
> Connection test method: telnet ipaddress 1500
>
> Connect from s390x via that vlan and port 1500 into the x86 server works, but 
> takes 5-10 seconds first time. And again 5-10 seconds if no traffic for about 
> 5 minutes.
>
> Connect from x86 same method does not work, unless a connection from s390x 
> was made the last 5 minutes.
>
> It seems the arp cache in the x86 server is updated and that server finds it 
> way after that, but timeout is 5 minutes, so after that it does not find it's 
> way.
>
> I took a tcpdump on that backup lan interface, got some help understanding it 
> using wireshark, and obviously the x86 connection tries does initiate an arp 
> query to the s390x server that also replies with its mac address, as it 
> should. Still this does not help.
> Route tables also looks as expected, should not cause a problem.
>
> Anyone having seen this problem ?
> If you also have the solution you will make my day :-)
>
> BR /Tore

Your description suggests the networking is specific to this
application.   You did not indicate if this is a new implementation but
your test method suggests it might be.

I have in the past seen similar situations.  Do not ignore intervening
network equipment, switch/router.  They have port configurations and
similar state information related to forwarding that may be the real
culprit.

Routing issues tend to be hard failures, unless you are dealing with
flaky connections.  The symptoms suggest to me a switch issue in a
switch adjacent to the one or both of the servers.  That is what I would
be exploring if I was trying to troubleshoot the issue.

The additional connectivity methods suggested by the other poster might
expose where the communication is failing.

In large organizations the networking and server teams, particularly
when mainframes are involved, make assumptions about what the other is
doing that do not always mesh.

Getting your networking staff involved may be helpful here.  They can
look at connectivity failures, if any, and the forwarding tables related
to MAC addresses.  A duplicate MAC address might be part of the problem.
Only the networking team can look at these factors.

Just suggestions for investigation,
Harold Grovesteen

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to