[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301596#comment-15301596
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9339:
--------------------------------------------

GitHub user dsclose reopened a pull request:

    https://github.com/apache/cloudstack/pull/1519

    Cloudstack 9339: Virtual Routers do not handle Multiple Public Interfaces

    This PR addresses CLOUDSTACK-9339 and may need a code review from someone 
familiar with the System VM scripts. In particular, this PR has not been tested 
in a VPC RvR context. Only standalone routers and RvR routers have been 
demonstrated.
    
    - **d582358: Leave public interfaces down in backup redundant routers.** 
Previously backup routers were bringing all interfaces up and thus arping 
public IPs away from the master router.
    - **9ee1eb6: Add the default gateway to the main routing table when 
interfaces are configured.** The gateway for the first public IP was always 
being added to the main routing table. Sometimes a router would consequently 
add the gateway for an IP other than the default source-NAT IP. This would 
prevent outbound connectivity for guest VMs.
    - **ad9d72f: Add default gateway to device-specific routing tables.** 
Link-level routes were being put into the device-specific routing tables 
(accessed via firewall marks) but these are unnecessary. Instead, the default 
gateway is needed to allow the kernel to make an appropriate routing decision.
    - **8db879e: Only mark guest connections when they are part of a 
static-NAT.** Guest connections were being marked with a zero. This added no 
functionality and prevented static-NAT rules from routing outbound traffic 
properly as device-specific routing tables would not be used. Instead, all 
traffic would be routed out via the default public interface.
    - **788b1be: Allow forwarding and collect network stats on any public 
interface.** Forwarding rules and network stats were limited to eth2 on RvR 
networks. This needed to be decoupled from eth2 and reapplied to whichever 
interface was under consideration.
    - **b19e8aa: Ensure that CONNMARK --restore-mark only appears once.** This 
is a bit of a hack and can do with being improved. The CONNMARK rule was not 
being picked up by the de-duplication logic in CsNetfilter and was being added 
twice. This caused checksum errors on packets traversing NAT.
    - **bf285e1: Transition to master state should add all necessary routes.** 
Now that backup routers keep their interfaces down, the route logic executed at 
configuration-time cannot be applied. Instead, once the interface is brought up 
during a transition to master, routers must re-evaluate what routes are needed 
and add them. Unfortunately I couldn't see a way to re-use the existing route 
logic with the variables that I had in scope so there is some duplication. In 
some cases, routers did not successfully arp IPs away from the old master so 
some arp logic was added. During a failover most connections with guest VMs 
will be maintained with only minor packet loss. SSH sessions implemented via 
port-forwarding rules on an interface other than the source-NAT interface 
consistently get dropped, however, so the failover isn't quite seamless. It's 
possible that there's an easy fix for that.
    
    I expect that a number of tests may need to be modified/written as part of 
this PR. Any feedback or pointers would be useful as initially I'll be relying 
on the CI failures to tell me where to look.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dsclose/cloudstack CLOUDSTACK-9339

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/1519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1519
    
----
commit e7a63be161bdd14c985a8b483bffe4bfdaa3f5d4
Author: dean.close <dean.cl...@icloudhosting.com>
Date:   2016-05-09T10:31:26Z

    CLOUDSTACK-9339: Handle multiple public subnets on virtual routers.

----


> Virtual Routers don't handle Multiple Public Interfaces
> -------------------------------------------------------
>
>                 Key: CLOUDSTACK-9339
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9339
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Virtual Router
>    Affects Versions: 4.8.0
>            Reporter: dsclose
>              Labels: firewall, nat, router
>
> There are a series of issues with the way Virtual Routers manage multiple 
> public interfaces. These are more pronounced on redundant virtual router 
> setups. I have not attempted to examine these issues in a VPC context. 
> Outside of a VPC context, however, the following is expected behaviour:
> * eth0 connects the router to the guest network.
> * In RvR setups, keepalived manages the guests' gateway IP as a virtual IP on 
> eth0.
> * eth1 provides a local link to the hypervisor, allowing Cloudstack to issue 
> commands to the router.
> * eth2 is the routers public interface. By default, a single public IP will 
> be setup on eth2 along with the necessary iptables and ip rules to source-NAT 
> guest traffic to that public IP.
> * When a public IP address is assigned to the router that is on a separate 
> subnet to the source-NAT IP, a new interface is configured, such as eth3, and 
> the IP is assigned to that interface.
> * This can result in eth3, eth4, eth5, etc. being created depending upon how 
> many public subnets the router has to work with.
> The above all works. The following, however, is currently not working:
> * Public interfaces should be set to DOWN on backup redundant routers. The 
> master.py script is responsible for setting public interfaces to UP during a 
> keepalived transition. Currently the check_is_up method of the CsIP class 
> brings all interfaces UP on both RvR. A proposed fix for this has been 
> discussed on the mailing list. That fix will leave public interfaces DOWN on 
> RvR allowing the keepalived transition to control the state of public 
> interfaces. Issue #1413 includes a commit that contradicts the proposed fix 
> so it is unclear what the current state of the code should be.
> * Newly created interfaces should be set to UP on master redundant routers. 
> Assuming public interfaces should be default be DOWN on an RvR we need to 
> accommodate the fact that, as interfaces are created, no keepalived 
> transition occurs. This means that assigning an IP from a new public subnet 
> will have no effect (as the interface will be down) until the network is 
> restarted with a "clean up."
> * Public interfaces other than eth2 do not forward traffic. There are two 
> iptables rules in the FORWARD chain of the filter table created for eth2 that 
> allow forwarding between eth2 and eth0. Equivalent rules are not created for 
> other public interfaces so forwarded traffic is dropped.
> * Outbound traffic from guest VMs does not honour static-NAT rules. Instead, 
> outbound traffic is source-NAT'd to the networks default source-NAT IP. New 
> connections from guests that are destined for public networks are processed 
> like so:
> 1. Traffic is matched against the following rule in the mangle table that 
> marks the connection with a 0x0:
> *mangle
> -A PREROUTING -i eth0 -m state --state NEW -j CONNMARK --set-xmark 
> 0x0/0xffffffff
> 2. There are no "ip rule" statements that match a connection marked 0x0, so 
> the kernel routes the connection via the default gateway. That gateway is on 
> source-NAT subnet, so the connection is routed out of eth2.
> 3. The following iptables rules are then matched in the filter table:
> *filter
> -A FORWARD -i eth0 -o eth2 -j FW_OUTBOUND
> -A FW_OUTBOUND -j FW_EGRESS_RULES
> -A FW_EGRESS_RULES -j ACCEPT
> 4. Finally, the following rule is matched from the nat table, where the IP 
> address is the source-NAT IP:
> *nat
> -A POSTROUTING -o eth2 -j SNAT --to-source 123.4.5.67
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to