On 29/07/2016 20:11, Brian Candler wrote:
I think I have this working by using proxyarp instead of bridging.

On the EC2 VM: leave lxdbr0 unconfigured. Then do:

sysctl net.ipv4.conf.all.forwarding=1
sysctl net.ipv4.conf.lxdbr0.proxy_arp=1
ip route add 10.0.0.40 dev lxdbr0
ip route add 10.0.0.41 dev lxdbr0
# where 10.0.0.40 and 10.0.0.41 are the IP addresses of the containers

The containers are statically configured with those IP addresses, and 10.0.0.1 as gateway.

This is sufficient to allow connectivity between the containers and other VMs in the same VPC - yay!

At this point, the containers *don't* have connectivity to the outside world. I can see the packets are being sent out with the correct source IP address (the container's) and MAC address (the EC2 VM), so I presume that the NAT in EC2 is only capable of working with the primary IP address - that's reasonable, if it's 1:1 NAT without overloading.

So there's also a need for iptables rules to NAT the container's address to the EC2 VM's address when talking to the outside world:

iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -o eth0 -j MASQUERADE

And hey presto: containers with connectivity, albeit fairly heavily frigged.


I now have it working with multiple NICs on the same VM, which lets you run more containers. It did however turn out to be rather more painful to set up, and I'm documenting it here in case it's useful for anyone else.

A t2.medium instance can have up to three NICs, and each one can have up to six IP addresses - one primary and 5 secondary. You can either let the AWS console pick random ones from your VPC range, or enter your own choices.

One wrinkle is that when you have two or more NICs, you *must* use an elastic IP for your first NIC's primary address - it can't map to a dynamic public IP any more. That's not a big deal (although if you've reached your limit for elastic IPs you have to raise a ticket with Amazon to ask for more)

Note that the primary address on any NIC cannot be changed without deleting and recreating the NIC. So if you want the flexibility to move an IP address over to a different VM, then you should really set the first address to something fixed and consider it wasted.

That still lets you run 15 containers on a single t2.medium instance though - potentially a 15:1 cost saving, if you have enough resources.

Next: eth0, eth1, eth2 are going to sit on the same IP subnet, but the default route is via eth0. So you want to assign a higher "metric" to eth1/eth2, so the default route will always use eth0. On Ubuntu: "apt-get install ifmetric" and then configure your interfaces like this:

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
address 10.0.0.211/24  # eth1's primary address
metric 100

... ditto for eth2 if using it

(I leave eth0 on dhcp because this makes it less likely that I can lock myself out by bad configuration)

Now, the problem comes with the robust way that EC2 does traffic security. You have created three NICs with 6 addresses each; but EC2 only accepts a packet coming out of a particular virtual NIC if it leaves with a source IP address which is one of the addresses assigned to that particular NIC. Also, the source MAC address must be the MAC address of that particular NIC.

These packets are going to originate from containers inside your VM, and each container doesn't know or care which interface their traffic will be forward through.

Fixing this requiressource routing <https://groups.google.com/forum/#%21topic/ganeti/qVMZFbH1X54>rules. In the following example, there are six containers using secondary addresses on this VM:

* eth0 primary address 10.0.0.201, secondary addresses 10.0.0.{21,23,53,49,89}

* eth1 primary address 10.0.0.211, secondary address 10.0.0.14

(eth2 is not being used)


Run the following command to create a routing table:

   echo "200 force-eth1" >>/etc/iproute2/rt_tables

Add to /etc/rc.local:

# Policy routing to force traffic with eth1 source address out of eth1
for addr in 10.0.0.14; do
  ip rule add from "$addr" table force-eth1
done

# Internet traffic, which is masqueraded, goes via eth0
ip route add default via 10.0.0.1 dev eth0 metric 100 table force-eth1
# Non-masqueraded traffic goes via eth1
ip route add 10.0.0.0/24 dev eth1  proto kernel  scope link  table force-eth1
ip route add 10.0.0.0/8 via 10.0.0.1 dev eth1 metric 100 table force-eth1
ip route flush cache

Check:

# ip rule list
0:      from all lookup local
32765:  from 10.0.0.14 lookup force-eth1
32766:  from all lookup main
32767:  from all lookup default

# ip route list table main
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0  proto kernel  scope link  src 10.0.0.201
10.0.0.0/24 dev eth1  proto kernel  scope link  src 10.0.0.211  metric 100
10.0.0.14 dev lxdbr0  scope link
10.0.0.21 dev lxdbr0  scope link
10.0.0.23 dev lxdbr0  scope link
10.0.0.49 dev lxdbr0  scope link
10.0.0.53 dev lxdbr0  scope link
10.0.0.89 dev lxdbr0  scope link

# ip route list table force-eth1
default via 10.0.0.1 dev eth0  metric 100
10.0.0.0/24 dev eth1  proto kernel  scope link
10.0.0.0/8 via 10.0.0.1 dev eth1  metric 100

Finally, add to /etc/rc.local the static routes and proxy ARP required for LXD container networking.

sysctl net.ipv4.conf.all.forwarding=1
sysctl net.ipv4.conf.lxdbr0.proxy_arp=1
for addr in 10.0.0.21 10.0.0.23 10.0.0.49 10.0.0.89 10.0.0.53 10.0.0.14; do
  ip route add "$addr" dev lxdbr0
done
# Masquerading for containers (except for our primary IP address)
iptables -t nat -A POSTROUTING -s 10.0.0.201 -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -j MASQUERADE


What these rules achieve are:

* traffic from a container which is going to another private address (e.g. another VM in the same subnet, or another subnet in your VPC) will retain its original source address. The source routing ensures it is sent out via eth0, eth1 or eth2 depending on which source address is being used.

* traffic from the container which is going out to the public Internet will be NAT'd to the eth0 primary address and will leave via eth0, so that it can in turn be NAT'd to the elastic public IP the EC2 VM.

Then you reboot and cross your fingers that you haven't locked yourself out. I did this quite a few times until I got it right :-)

There's no console on EC2 VMs, so if it's broken, either you have to detach your EBS volume and attach it to another temporary instance, or you just blow the instance away and start again.

Cheers,

Brian Candler.
_______________________________________________
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Reply via email to