Andrei, if not mistaken I believe I saw same behavior even on 4.8 - in our case, what I vaguely remember was, that we configure Port Forwarding instead of Static NAT - it did solve our use case (for some customer), but maybe it's not acceptable for you...
Cheers On Mon, 9 Jul 2018 at 18:27, Andrei Mikhailovsky <and...@arhont.com.invalid> wrote: > Hi Rohit, > > I would like to send you a quick update on this issue. I have recently > upgraded to 4.11.1.0 with the new system vm templates. The issue that I've > described is still present in the latest release. Hasn't it been included > in the latest 4.11 maintenance release? I thought that it would be as it > breaks the major function of the VPC. > > Cheers. > > Andrei > > ----- Original Message ----- > > From: "Andrei Mikhailovsky" <and...@arhont.com.INVALID> > > To: "dev" <dev@cloudstack.apache.org> > > Sent: Friday, 20 April, 2018 11:52:30 > > Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT > > > Thanks > > > > > > > > ----- Original Message ----- > >> From: "Rohit Yadav" <rohit.ya...@shapeblue.com> > >> To: "dev" <dev@cloudstack.apache.org>, "dev" <dev@cloudstack.apache.org > > > >> Sent: Friday, 20 April, 2018 10:35:55 > >> Subject: Re: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT > > > >> Hi Andrei, > >> > >> I've fixed this recently, please see > >> https://github.com/apache/cloudstack/pull/2579 > >> > >> As a workaround you can add routing rules manually. On the PR, there is > a link > >> to a comment that explains the issue and suggests manual workaround. > Let me > >> know if that works for you. > >> > >> Regards. > >> > >> > >> From: Andrei Mikhailovsky > >> Sent: Friday, 20 April, 2:21 PM > >> Subject: Upgrade from ACS 4.9.X to 4.11.0 broke VPC source NAT > >> To: dev > >> > >> > >> Hello, I have been posting to the users thread about this issue. here > is a quick > >> summary in case if people contributing to the source nat code on the > VPC side > >> would like to fix this issue. Problem summary: no connectivity between > virtual > >> machines behind two Static NAT networks. Problem case: When one virtual > machine > >> sends a packet to the external address of the another virtual machine > that are > >> handled by the same router and both are behind the Static NAT the > traffic does > >> not work. 10.1.10.100 10.1.10.1:eth2 eth3:10.1.20.1 10.1.20.100 virt1 > router > >> virt2 178.248.108.77:eth1:178.248.108.113 a single packet is send from > virt1 to > >> virt2. stage1: it arrives to the router on eth2 and enters > "nat_PREROUTING" > >> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113) goes through the "10 > 1K DNAT > >> all -- * * 0.0.0.0/0 178.248.108.113 to:10.1.20.100 " rule and has the > DST > >> DNATED to the internal IP of the virt2 stage2: Enters the FORWARDING > chain and > >> is being DROPPED by the default policy. DROPPED:IN=eth2 OUT=eth1 > >> SRC=10.1.10.100 DST=10.1.20.100 The reason being is that the OUT > interface is > >> not correctly changed from eth1 to eth3 during the nat_PREROUTING so > the packet > >> is not intercepted by the FORWARD rule and thus not accepted. "24 14K > >> ACL_INBOUND_eth3 all -- * eth3 0.0.0.0/0 10.1.20.0/24" stage3: manually > >> inserted rule to accept this packet for FORWARDING. the packet enters > the > >> "nat_POSTROUTING" chain IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 > and has > >> the SRC changed to the external IP 16 1320 SNAT all -- * eth1 > 10.1.10.100 > >> 0.0.0.0/0 to:178.248.108.77 and is sent to the external network on > eth1. > >> 13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id > 2644, > >> seq 2, length 64 For some reason, during the nat_PREROUTING stage the > DST_IP is > >> changed, but the OUT interface still reflects the interface associated > with the > >> old DST_IP. Here is the routing table # ip route list default via > 178.248.108.1 > >> dev eth1 10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1 > >> 10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1 > 169.254.0.0/16 dev > >> eth0 proto kernel scope link src 169.254.0.5 178.248.108.0/25 dev eth1 > proto > >> kernel scope link src 178.248.108.101 # ip rule list 0: from all lookup > local > >> 32761: from all fwmark 0x3 lookup Table_eth3 32762: from all fwmark 0x2 > lookup > >> Table_eth2 32763: from all fwmark 0x1 lookup Table_eth1 32764: from > 10.1.0.0/16 > >> lookup static_route_back 32765: from 10.1.0.0/16 lookup static_route > 32766: > >> from all lookup main 32767: from all lookup default Further into the > >> investigation, the problem was pinned down to those rules. All the > traffic from > >> internal IP on the static NATed connection were forced to go to the > outside > >> interface (eth1), by setting the mark 0x1 and then using the matching # > ip rule > >> to direct it. #iptables -t mangle -L PREROUTING -vn Chain PREROUTING > (policy > >> ACCEPT 97 packets, 11395 bytes) pkts bytes target prot opt in out source > >> destination 49 3644 CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW > >> CONNMARK save 37 2720 MARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW > MARK set > >> 0x1 37 2720 CONNMARK all -- * * 10.1.20.100 0.0.0.0/0 state NEW > CONNMARK save > >> 114 8472 MARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW MARK set 0x1 > 114 8472 > >> CONNMARK all -- * * 10.1.10.100 0.0.0.0/0 state NEW CONNMARK save # ip > rule 0: > >> from all lookup local 32761: from all fwmark 0x3 lookup Table_eth3 > 32762: from > >> all fwmark 0x2 lookup Table_eth2 32763: from all fwmark 0x1 lookup > Table_eth1 > >> 32764: from 10.1.0.0/16 lookup static_route_back 32765: from > 10.1.0.0/16 lookup > >> static_route 32766: from all lookup main 32767: from all lookup default > The > >> acceptable solution is to delete those rules all together.? The problem > with > >> such approach is that the inter VPC traffic will use the internal IP > addresses, > >> so the packets going from 178.248.108.77 to 178.248.108.113 would be > seen as > >> communication between 10.1.10.100 and 10.1.20.100 thus we need to apply > further > >> two rules # iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d > >> 10.1.20.0/24 -j SNAT --to-source 178.248.108.77 # iptables -t nat -I > >> POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j SNAT --to-source > >> 178.248.108.113 in order to make sure that the packets leaving the > router would > >> have correct source IP. This way it is possible to have static NAT on > all of > >> the IPS within the VPC and ensure a successful communication between > them. So, > >> for a quick and dirty fix, we ran this command on the VR: for i in > iptables -t > >> mangle -L PREROUTING -vn | awk '/0x1/ && !/eth1/ {print $8}'; do > iptables -t > >> mangle -D PREROUTING -s $i -m state —state NEW -j MARK —set-mark "0x1" > ; done > >> The issue has been introduced around early 4.9.x releases I believe. > Thanks > >> Andrei > >> rohit.ya...@shapeblue.com > >> www.shapeblue.com > >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > >> @shapeblue > >> > >> > >> > >> ----- Original Message ----- > From: "Andrei Mikhailovsky" > To: > "users" > Sent: > >> Monday, 16 April, 2018 22:32:25 > Subject: Re: Upgrade from ACS 4.9.3 > to 4.11.0 > >> > Hello, > > I have done some more testing with the VPC network tiers > and it > >> seems that the > Static NAT is indeed causing connectivity issues. Here > is what > >> I've done: > > > Setup 1. I have created two test network tiers with > one guest > >> vm in each tier. > Static NAT is NOT enabled. Each VM has a port > forwarding > >> rule (port 22) from > its dedicated public IP address. ACLs have been > setup to > >> allow traffic on port > 22 from the private ip addresses on each > network tier. > >> > > 1. ACLs seems to work just fine. traffic between the networks flows > >> according to > the rules. both vms can see each other's private IPs and > can > >> ping/ssh/etc > > 2. From the Internet hosts can access vms on port 22 > > > 4. > >> The vms can also access each other and itself on their public IPs. I > don't > > >> think this worked before, but could be wrong. > > > > Setup 2. > Everything the > >> same as Setup 1, but one public IP address has been > setup as Static > NAT to > >> one guest vm. the second guest vm and second public IP > remained > unchanged. > > >> > 1. ACLs stopped working correctly (see below) > > 2. From the > Internet hosts > >> can access vms on port 22, including the Static NAT > vm > > 3. Other > guest vms > >> can access the Static NAT vm using private & public IP > addresses > > > 4. > >> Static NAT vm can NOT access other vms neither using public nor private > IPs > > > >> 5. Static NAT vm can access the internet hosts (apart from the public > IP range > >> > belonging to the cloudstack setup) > > > The above behaviour of Setup > 2 > >> scenarios is very strange, especially points 4 & > 5. > > Any thoughts > anyone? > >> > > Cheers > > ----- Original Message ----- >> From: "Rohit Yadav" >> > To: > >> "users" >> Sent: Thursday, 12 April, 2018 12:06:54 >> Subject: Re: > Upgrade from > >> ACS 4.9.3 to 4.11.0 > >> Hi Andrei, >> >> >> Thanks for sharing, yes > the egress > >> thing is a known issue which is caused due to >> failure during VR > setup to > >> create egress table. By performing a restart of the >> network (without > cleanup > >> option selected), the egress table gets created and >> rules are > successfully > >> applied. >> >> >> The issue has been fixed in the vr downtime pr: >> >> > >> https://github.com/apache/cloudstack/pull/2508 >> >> >> - Rohit >> >> > >> >> >> > >> >> ________________________________ >> From: Andrei Mikhailovsky >> > Sent: > >> Tuesday, April 3, 2018 3:33:43 PM >> To: users >> Subject: Re: Upgrade > from ACS > >> 4.9.3 to 4.11.0 >> >> Rohit, >> >> Following the update from 4.9.3 to > 4.11.0, I > >> would like to comment on a few >> things: >> >> 1. The upgrade went > well, a > >> part from the cloudstack-management server startup >> issue that I've > described > >> in my previous email. >> 2. there was an issue with the virtual router > template > >> upgrade. The issue is >> described below: >> >> VR template upgrade > issue: >> > >> >> After updating the systemvm template I went onto the Infrastructure > > >> Virtual >> Routers and selected the Update template option for each > virtual > >> router. The >> virtual routers were updated successfully using the new > >> templates. However, >> this has broken ALL Egress rules on all networks > and > >> none of the guest vms. >> Port forwarding / incoming rules were working > just > >> fine. Removal and addition >> of Egress rules did not fix the issue. To > fix the > >> issue I had to restart each >> of the networks with the Clean up option > ticked. > >> >> >> >> Cheers >> >> Andrei >> >> rohit.ya...@shapeblue.com >> > >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N > 4HSUK >> > >> @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Andrei > >> Mikhailovsky" >>> To: "users" >>> Sent: Monday, 2 April, 2018 21:44:27 > >>> > >> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >> >>> Hi Rohit, >>> >>> > >> Following some further investigation it seems that the installation > packages > >> >>> replaced the following file: >>> >>> > /etc/default/cloudstack-management >>> > >> >>> with >>> >>> /etc/default/cloudstack-management.dpkg-dist >>> >>> > >>> Thus, > >> the management server couldn't load the env variables and thus was > unable >>> > >> to start. >>> >>> I've put the file back and the management server is > able to > >> start. >>> >>> I will let you know if there are any other > issues/problems. >>> > >> >>> Cheers >>> >>> Andrei >>> >>> >>> >>> ----- Original Message ----- > >>>> > >> From: "Andrei Mikhailovsky" >>>> To: "users" >>>> Sent: Monday, 2 > April, 2018 > >> 20:58:59 >>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>> >>>> Hi > Rohit, > >> >>>> >>>> I have just upgraded and having issues starting the service > with the > >> following >>>> error: >>>> >>>> >>>> Apr 02 20:56:37 ais-cloudhost13 > >> systemd[1]: cloudstack-management.service: >>>> Failed to load > environment > >> files: No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13 > >> systemd[1]: cloudstack-management.service: >>>> Failed to run > 'start-pre' task: > >> No such file or directory >>>> Apr 02 20:56:37 ais-cloudhost13 > systemd[1]: > >> Failed to start CloudStack >>>> Management Server. >>>> -- Subject: Unit > >> cloudstack-management.service has failed >>>> -- Defined-By: systemd > >>>> >>>> > >> Cheers >>>> >>>> Andrei >>>> >>>> ----- Original Message ----- >>>>> > From: > >> "Rohit Yadav" >>>>> To: "users" >>>>> Sent: Friday, 30 March, 2018 > 19:17:48 > >> >>>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0 >>>> >>>>> Some of > the > >> upgrade and minor issues have been fixed and will make their way >>>>> > into > >> 4.11.1.0. You're welcome to upgrade and share your feedback, but bear > in >>>>> > >> mind due to some changes a new/updated systemvmtemplate need to be > issued for > >> >>>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 > releases, > >> but >>>>> 4.11.0.0 users will have to register that new template). > >>>>> >>>>> > >> >>>>> >>>>> - Rohit >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> > >> ________________________________ >>>>> From: Andrei Mikhailovsky >>>>> > Sent: > >> Friday, March 30, 2018 11:00:34 PM >>>>> To: users >>>>> Subject: > Upgrade from > >> ACS 4.9.3 to 4.11.0 >>>>> >>>>> Hello, >>>>> >>>>> My current > infrastructure is > >> ACS 4.9.3 with KVM based on Ubuntu 16.04 servers >>>>> for the KVM > hosts and > >> the management server. >>>>> >>>>> I am planning to perform an upgrade > from ACS > >> 4.9.3 to 4.11.0 and was wondering >>>>> if anyone had any issues during > the > >> upgrades? Anything to watch out for? >>>>> >>>>> I have previously seen > issues > >> with upgrading to 4.10, which required some manual >>>>> db updates > from what I > >> recall. Has this issue been fixed in the 4.11 upgrade >>>>> process? > >>>>> > >> >>>>> thanks >>>>> >>>>> Andrei >>>>> >>>>> rohit.ya...@shapeblue.com > >>>>> > >> www.shapeblue.com >>>>> 53 Chandos Place, Covent Garden, London WC2N > 4HSUK > > > > > > > > @shapeblue > -- Andrija Panić