Why aren't you using SRM to modify the network settings during failover? - Sean
> On Jun 24, 2016, at 8:30 AM, Michael Leone <[email protected]> wrote: > >> On Fri, Jun 24, 2016 at 12:04 PM, Rubens Almeida <[email protected]> >> wrote: >> Here's my 2 cents on this matter: I'm still waiting to see when a Windows >> server host will handle 2 gateways without trouble. I'm used to see on every >> customer I'm assigned to work as SME on my day job. Every one of them have >> this kind of issue on one degree or another. What I do is: on the production >> NIC I set the customer's gateway. On all other NICs no gateway at all. If >> needed, I then set a persistent routes pointing to the respective gateway >> handling that specific network. Hope that helps! > > As I said, there are no other NICs. Also, in case of disaster, I don't want > to have to edit 175 VMs, to set addressing on a previously unused NIC > (script-based or not). I need an automatic dead-gateway detection and > failover, apparently. > > >> >> Rubens >> >>> On Fri, Jun 24, 2016 at 12:23 PM, Michael Leone <[email protected]> wrote: >>> Here's my setup: I have a lot of VMware VMs. We also use their SRM (Site >>> Recovery Manager) for Disaster Recovery. Basically, SRM lets the VMs fail >>> over to another site, in case of disaster. They will keep their current IP >>> addressing. >>> >>> So what we did was set 2 gateways on each VM - first entry is x.x.x.1, >>> which is the gateway at the production site. Second entry is x.x.x.2, which >>> is the gateway at the recovery site. This way, if the VMs did fail over, >>> they would still be able to find a gateway and continue to work (since >>> theoretically x.x.x.1 would not be available, being a smoldering pile of >>> ash or whatever). Note that these are all 1 NIC machines, no multi-homing. >>> And all static addressing, no DHCP. >>> >>> I seem to recall testing this a couple years ago, and it worked fine. >>> However, I'm old, so who knows how faulty my memory is ... >>> >>> Here's the problem - yesterday the recovery site went down. Mind you, the >>> main production site stayed up, and in fact, has never gone down. But then >>> I started getting weird calls - I couldn't ping some VMs, yet other on the >>> same subnet as I am had no difficulties. >>> >>> Eventually, what I had to do was delete the x.x.x.2 gateway entry from the >>> problematical machines, flush their DNS cache, and then everyone could >>> access these VMs again. >>> >>> But why?. Since the main production site switch never went down, none of >>> the VMs should have been using the recovery site as a gateway; they should >>> all have been using x.x.x.1, and the fact that x.x.x.2 was unavailable >>> should not have matter to them in the slightest. >>> >>> And even if they were using the recovery site x.x.x.2 as gateway, once it >>> dropped, the VM should have still been able to use the other entry, the >>> production site switch x.x.x.1, as a gateway and continued to be available. >>> >>> So, 3 questions then: >>> >>> 1. Am I wrong in believing that a Windows machine (Win 2008 R2 and Win 2012 >>> R2) will use the gateways in the order listed? (i.e., use x.x.x.1 first, >>> and not try to use x.x.x.2 unless x.x.x.1 is unavailable). Seems most of my >>> VMs worked this way, but not all, yet all are configured the same way. >>> >>> 2. And, if the gateway in use (for example, x.x.x.2) becomes unavailable, I >>> thought Windows would automatically try the other entry, without any user >>> intervention. Is this not so? >>> >>> 3. What I want is that for the VMs to use the first gateway listed. If it >>> can't reach or use that, then I want it to automatically use the next entry >>> in the gateway list. Is this possible? If so, then how? >>> >>> Thanks for any help. >

