Hi, We're currently running our production zLinux environment under z/VM v5.1 using a quasi-VSWITCH networking configuration. By "quasi-", I mean that we use VSWITCH to connect our two OSA adapters to TCPIP and TCPIP also connects a guest LAN, to which the zLinux guests are connected. Routing between the OSA adapters (which reside in separate IP subnets is done by MPROUTE/OSPF). That configuration works normally in our production environment.
Our normal disaster recovery configuration is the same except (1 our VM system is running second-level under a DR-vendor (IBM BRCS) VM system, (2 we only use a single OSA adapter instead of two and (3 the network connecting our customer workstations to our VM system is several major hops away (from our DR site to Sterling Forest, NY to Gaithesburg, MD). Here is sort of a picture of this configuration, mainly concentrating on the VM aspect: Workstation ------------- Network Cloud ----------- OSA Adapter ----- VSWITCH (10.8.128.14) --- TCPIP (VIPA 10.8.190.2) ---- Guest LAN Gateway (10.8.185.1) ---- Linux Guests (10.8.185.2-10.8.185.30) This configuration has worked during on two prior disaster recovery tests. However, during our DR test yesterday, our customers were not able to connect to the Linux systems (10.8.185.2-10.8.185.30) or even ping them. We could connect (ping and logon) to our VM system itself and all of its IP addresses (10.8.128.14, 10.8.190.2, 10.8.185.1). After logging to VM, I could sucessfully ping each of the Linux guests as well as the Guest LAN gateway (from VM). By logging onto the Linux guests via VM consoles, I could ping all the other Linux guests, the Guest LAN interface, the VM VIPA address as well as the VSWITCH address. The Linux guests could not ping a customer workstation. On either side, TRACERT showed the roadblock to be what I'm told is a router used for DR at address 10.8.2.1; all of the traceroutes failed after passing this point. Our router people told us this router contained a route for our 10.8.185.* traffic (remember that we can ping 10.8.185.1 from customer workstations) and our firewall people told us firewalls don't exist on that path of the network. We eventually "circumvented" the problem by converting our VM system to a standard OSA environment, still using Guest LAN but eliminating VSWITCH. Instead of connecting to VSWITCH, we connected our OSA adapter addresses directly to the VM TCPIP stack. None of the IP addresses changed and we didn't change any part of the Guest LAN configuration. It was a shot-in-the-dark which worked. The network/firewall folks claim they didn't change anything and I believe them (unless their changes coincidentally happened in the five minutes it took to IPL VM). After converting to a direct OSA connection, everything started working. Has anyone seen anything like this before? Do you have any idea why this might have happened? I understand I haven't provided much specific information but that's largely because I'm not sure what information is needed. I'm sure there are some NETGATE or SMSG MPROUTE OSPF commands I could have entered (and our network folks were interested in displaying VM's routing table but neither NETSTAT GATE nor SMSG MPROUTE OSPF ROUTERS impressed them very much). I would have been interested in a network trace at the point in BRCS' network environment where it connects to the OSA card to see what packets were coming and going. Not that I could interpret that data but I'm sure someone could. However, our network folks didn't think our failed packets were getting that far anyway and we didn't have time left for a lot of tracing setup. I understand its also possible to run some type(s) of VM traces to see packets, as well. Can anyone suggest specific documentation I should gather in the event this problem occurs during our next DR test? Thanks, Dennis Schaffer Mutual of Omaha
