Re: [lxc-users] LXD containers lose outbound network

david . andel Mon, 27 Jun 2016 03:15:48 -0700

Ok, this happens again and again!
Like this LXD is not usable in production. I cannot restart LXD every few days.


I'll answer Fajar's questions from below here:

By "inbound" I mean connections from the host/internet to he container. Those 
work and keep working. I have port forwarding enabled.
By "outbound" I mean connections from the container to the host/internet. The 
latter keep failing after some time (several days or so).

On the host:
I can ping the container just fine.

In the container:
I can ping lxdbr0:

root@taskd:~# ping 10.0.8.1                                                     
                                                                  
PING 10.0.8.1 (10.0.8.1) 56(84) bytes of data.                                  
                                                                  
64 bytes from 10.0.8.1: icmp_seq=1 ttl=64 time=0.202 ms                         
                                                                  
64 bytes from 10.0.8.1: icmp_seq=2 ttl=64 time=0.121 ms                         
                                                                  
64 bytes from 10.0.8.1: icmp_seq=3 ttl=64 time=0.144 ms

And "tcpdump -i lxdbr0" on the host shows:
11:29:23.570390 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 12901, seq 1, 
length 64
11:29:23.570459 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 12901, seq 1, 
length 64
11:29:24.569336 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 12901, seq 2, 
length 64
11:29:24.569386 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 12901, seq 2, 
length 64
11:29:25.568580 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 12901, seq 3, 
length 64
11:29:25.568630 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 12901, seq 3, 
length 64

However, I cannot ping an outside IP:
root@taskd:~# ping 8.8.8.8                                                      
                                                                  
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

On the host I see:
11:30:14.343238 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 12902, seq 1, length 64
11:30:15.350848 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 12902, seq 2, length 64
11:30:16.352577 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 12902, seq 3, length 64
11:30:17.352640 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 12902, seq 4, length 64
11:30:18.352628 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 12902, seq 5, length 64

When trying to ping google.com I see:
11:30:52.847738 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
12903, seq 1, length 64
11:30:53.854716 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
12903, seq 2, length 64
11:30:54.862632 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
12903, seq 3, length 64
11:30:55.870632 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
12903, seq 4, length 64
11:30:56.878594 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
12903, seq 5, length 64

But at the same time I can ping google.com from the host!

After running

service lxd stop
service lxd-bridge stop
service lxd start

on the host, everything works again.

Here the same "tcpdump -i lxdbr0" as above:
12:11:44.317375 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 13076, seq 1, 
length 64
12:11:44.317477 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 13076, seq 1, 
length 64
12:11:45.316620 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 13076, seq 2, 
length 64
12:11:45.316680 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 13076, seq 2, 
length 64
12:11:46.316587 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 13076, seq 3, 
length 64
12:11:46.316645 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 13076, seq 3, 
length 64

12:11:55.044655 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 13077, seq 1, length 64
12:11:55.045254 IP google-public-dns-a.google.com > 10.0.8.54: ICMP echo reply, 
id 13077, seq 1, length 64
12:11:56.044626 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 13077, seq 2, length 64
12:11:56.045285 IP google-public-dns-a.google.com > 10.0.8.54: ICMP echo reply, 
id 13077, seq 2, length 64
12:11:57.044617 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo 
request, id 13077, seq 3, length 64
12:11:57.045264 IP google-public-dns-a.google.com > 10.0.8.54: ICMP echo reply, 
id 13077, seq 3, length 64

12:12:15.553335 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
13078, seq 1, length 64
12:12:15.554093 IP zrh04s08-in-f14.1e100.net > 10.0.8.54: ICMP echo reply, id 
13078, seq 1, length 64
12:12:16.554574 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
13078, seq 2, length 64
12:12:16.555275 IP zrh04s08-in-f14.1e100.net > 10.0.8.54: ICMP echo reply, id 
13078, seq 2, length 64
12:12:17.553578 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 
13078, seq 3, length 64
12:12:17.554337 IP zrh04s08-in-f14.1e100.net > 10.0.8.54: ICMP echo reply, id 
13078, seq 3, length 64

The ARP requests I have removed, because they are the same in both cases.

What could be happening to LXD after it's been running for a while??

Thanks

-----"lxc-users" <[email protected]> wrote: -----
To: LXC users mailing-list <[email protected]>
From: "Fajar A. Nugraha" 
Sent by: "lxc-users" 
Date: 05/30/2016 7:14
Subject: Re: [lxc-users] LXD containers lose outbound network

On Sun, May 29, 2016 at 1:30 PM,  <[email protected]> wrote:
Hi

My LXD has the following network configuration:

root@qumind:~# egrep -v '(^#|^$)' /etc/default/lxd-bridge 
USE_LXD_BRIDGE="true"
LXD_BRIDGE="lxdbr0"
UPDATE_PROFILE="true"
LXD_CONFILE=""
LXD_DOMAIN="lxd"
LXD_IPV4_ADDR="10.0.8.1"
LXD_IPV4_NETMASK="255.255.255.0"
LXD_IPV4_NETWORK="10.0.8.0/24"
LXD_IPV4_DHCP_RANGE="10.0.8.2,10.0.8.254"
LXD_IPV4_DHCP_MAX="253"
LXD_IPV4_NAT="true"
LXD_IPV6_ADDR=""
LXD_IPV6_MASK=""
LXD_IPV6_NETWORK=""
LXD_IPV6_NAT="false"
LXD_IPV6_PROXY="false"

And the network works fine at first. However, after some time outbound 
connections fail, while inbound connections continue working.
It affects all LXD containers.

What do you mean "outbound" and "inbound"?

>From that setup, you have a NAT network. So others servers in your LAN 
>shouldn't be able to access your containers, unless you also setup port 
>forwarding (which you didn't mention). So "inbound" can't mean "other servers 
>in your LAN accessing your container" in your case.

If by "inbound" you mean "even the host can't access the container", then 
something is definitely wrong. I'd start by using simple "ping" test when that 
happens, coupled with "tcpdump" on both the host (lxdbr0 and veth*) and 
container (eth0) side.

�
And it is not enough to just run 

root@qumind:~# service lxd-bridge stop
Job for lxd-bridge.service canceled.
root@qumind:~# service lxd restart

while the containers are running. The behaviour stays the same.


Obviously. You can't delete a bridge that has interfaces attached (which is the 
case when containers are running)

�I have to stop the containers first, then restart the LXD bridge and start the 
containers again.
Only then the outbound connections work again - until I have to restart it all 
again.

What could be the culprit?


Start with the basics:
- test host <-> container networking first. Use "ping" and "tcpdump" to help
- look for error/weird messages at syslog, e.g. "iptables" or "conntrack"

�Thanks

PS:
To stop all running containers I am using 
for i in $(lxc list | grep RUNNING | awk -F'|' '{print $2}' | tr -d [:blank:]); 
do lxc stop $i; done
I think it would be convenient to be able to just say 
lxc stop all


"service lxd stop" would stop all running containers before stopping lxd. And 
"service lxd start" after that will start containers that were previously 
started, as well as containers with�boot.autostart: "true"

--�
Fajar 
_______________________________________________
lxc-users mailing list
[email protected]
http://lists.linuxcontainers.org/listinfo/lxc-users

_______________________________________________
lxc-users mailing list
[email protected]
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] LXD containers lose outbound network

Reply via email to