swill opened a new pull request #2190: Fixed the negotiation of S2S VPN connections URL: https://github.com/apache/cloudstack/pull/2190 I am not sure if a regression was introduced with this PR https://github.com/apache/cloudstack/pull/2062, but we have found issues with this configuration now that we have it in production. It turns out that all the testing we had scripted to test this feature did not have adequate sleep time. There are situations with the current configuration where the connection is established for 5-10 seconds, but drops afterwards. The testing I was doing was checking if the connection was established in that window, so the test would always return as success even though the connection would drop after the test. @syed and I have done about 20 hours of testing and troubleshooting to get to the root cause of the problem. Previously, by having both sides of the VPN connection `auto=start`, the VPN connection would have problems initializing. The core of this change is to change the S2S VPN config from `auto=start` to `auto=route`. Read more about this setting here: https://wiki.strongswan.org/projects/strongswan/wiki/ConnSection We found there to be issues when using `auto=start` for both sides of the connection as there was problem negotiating the connection. Instead the `auto=route` config will only establish a connection once there is an attempt to send traffic over the connection. In order to attempt to open the connection as soon as the VPN connection is configured, a ping to the other side of the connection has been added to establish the connection. I was able to consistently reproduce failure scenarios with the previous configuration by doing the following: - Enable VPN Gateways on two different VPCs - Create both VPN Customer Gateways on the two VPCs - After 10 seconds, create the first VPN Connection - After 5 seconds, create the second VPN Connection - Wait 35 seconds to let the `checks2svpn.sh` check to run - List the VPN Connections to validate the connection State is correct - SSH into a VM in one VPC and ping a VM on the other VPC. - Tear down the VPN Connections and Gateways - Wait 150 seconds between tests to ensure the the ipsec VPN running state is cleared (I believe this is related to the `dpdtimeout=120`, but I did not spend the time to try to isolate that for sure). Here are the results after applying this fix and testing with the above process: ``` +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | Status | IKE | ESP | DPD | Encap | IKE Life | ESP Life | Passive | Conn State | +========+========================+========================+=======+=======+==========+==========+===============+=============================+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | 86400 | 3600 | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | True | 86400 | 3600 | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | | 3600 | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | 86400 | | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | | | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | 86400 | 3600 | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | 86400 | 3600 | True : True | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | 86400 | 3600 | False : True | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | True | False | 86400 | 3600 | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | False | False | 86400 | 3600 | False : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | False | False | 86400 | 3600 | True : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | False | False | 86400 | 3600 | True : True | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1;modp1536 | False | False | 86400 | 3600 | False : True | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha1;modp1536 | aes128-sha1 | True | False | 86400 | 3600 | True : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha256;modp3072 | aes128-sha384;modp2048 | True | False | 86400 | 3600 | True : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha384;modp4096 | aes128-sha512;modp6144 | True | False | 86400 | 3600 | True : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ | OK | aes128-sha512;modp8192 | aes128-sha384;modp8192 | True | False | 86400 | 3600 | True : False | Connected : Connected | +--------+------------------------+------------------------+-------+-------+----------+----------+---------------+-----------------------------+ ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
With regards, Apache Git Services
