Hi,

We're currently running our production zLinux environment under z/VM v5.1
using a quasi-VSWITCH networking configuration.  By "quasi-", I mean that
we use VSWITCH to connect our two OSA adapters to TCPIP and TCPIP also
connects a guest LAN, to which the zLinux guests are connected.  Routing
between the OSA adapters (which reside in separate IP subnets is done by
MPROUTE/OSPF).  That configuration works normally in our production
environment.

Our normal disaster recovery configuration is the same except (1 our VM
system is running second-level under a DR-vendor (IBM BRCS) VM system, (2
we only use a single OSA adapter instead of two and (3 the network
connecting our customer workstations to our VM system is several major hops
away (from our DR site to Sterling Forest, NY to Gaithesburg, MD).

Here is sort of a picture of this configuration, mainly concentrating on
the VM aspect:


Workstation  -------------  Network Cloud ----------- OSA Adapter  -----
VSWITCH (10.8.128.14) --- TCPIP (VIPA 10.8.190.2) ---- Guest LAN Gateway
(10.8.185.1) ---- Linux Guests (10.8.185.2-10.8.185.30)

This configuration has worked during on two prior disaster recovery tests.

However, during our DR test yesterday, our customers were not able to
connect to the Linux systems (10.8.185.2-10.8.185.30) or even ping them.
We could connect (ping and logon) to our VM system itself and all of its IP
addresses (10.8.128.14, 10.8.190.2, 10.8.185.1).

After logging to VM, I could sucessfully ping each of the Linux guests as
well as the Guest LAN gateway (from VM).  By logging onto the Linux guests
via VM consoles, I could ping all the other Linux guests, the Guest LAN
interface, the VM VIPA address as well as the VSWITCH address.  The Linux
guests could not ping a customer workstation.

On either side, TRACERT showed the roadblock to be what I'm told is a
router used for DR at address 10.8.2.1; all of the traceroutes failed after
passing this point.  Our router people told us this router contained a
route for our 10.8.185.* traffic (remember that we can ping 10.8.185.1 from
customer workstations) and our firewall people told us firewalls don't
exist on that path of the network.

We eventually "circumvented" the problem by converting our VM system to a
standard OSA environment, still using Guest LAN but eliminating VSWITCH.
Instead of connecting to VSWITCH, we connected our OSA adapter addresses
directly to the VM TCPIP stack.  None of the IP addresses changed and we
didn't change any part of the Guest LAN configuration.  It was a
shot-in-the-dark which worked.  The network/firewall folks claim they
didn't change anything and I believe them (unless their changes
coincidentally happened in the five minutes it took to IPL VM).  After
converting to a direct OSA connection, everything started working.

Has anyone seen anything like this before?   Do you have any idea why this
might have happened?

I understand I haven't provided much specific information but that's
largely because I'm not sure what information is needed.  I'm sure there
are some NETGATE or SMSG MPROUTE OSPF commands I could have entered (and
our network folks were interested in displaying VM's routing table but
neither NETSTAT GATE nor SMSG MPROUTE OSPF ROUTERS impressed them very
much).

I would have been interested in a network trace at the point in BRCS'
network environment where it connects to the OSA card to see what packets
were coming and going.   Not that I could interpret that data but I'm sure
someone could.  However, our network folks didn't think our failed packets
were getting that far anyway and we didn't have time left for a lot of
tracing setup.  I understand its also possible to run some type(s) of VM
traces to see packets, as well.

Can anyone suggest specific documentation I should gather in the event this
problem occurs during our next DR test?

Thanks,

Dennis Schaffer
Mutual of Omaha

Reply via email to