Ok, this happens again and again! Like this LXD is not usable in production. I cannot restart LXD every few days.
I'll answer Fajar's questions from below here: By "inbound" I mean connections from the host/internet to he container. Those work and keep working. I have port forwarding enabled. By "outbound" I mean connections from the container to the host/internet. The latter keep failing after some time (several days or so). On the host: I can ping the container just fine. In the container: I can ping lxdbr0: root@taskd:~# ping 10.0.8.1 PING 10.0.8.1 (10.0.8.1) 56(84) bytes of data. 64 bytes from 10.0.8.1: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 10.0.8.1: icmp_seq=2 ttl=64 time=0.121 ms 64 bytes from 10.0.8.1: icmp_seq=3 ttl=64 time=0.144 ms And "tcpdump -i lxdbr0" on the host shows: 11:29:23.570390 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 12901, seq 1, length 64 11:29:23.570459 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 12901, seq 1, length 64 11:29:24.569336 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 12901, seq 2, length 64 11:29:24.569386 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 12901, seq 2, length 64 11:29:25.568580 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 12901, seq 3, length 64 11:29:25.568630 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 12901, seq 3, length 64 However, I cannot ping an outside IP: root@taskd:~# ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. On the host I see: 11:30:14.343238 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 12902, seq 1, length 64 11:30:15.350848 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 12902, seq 2, length 64 11:30:16.352577 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 12902, seq 3, length 64 11:30:17.352640 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 12902, seq 4, length 64 11:30:18.352628 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 12902, seq 5, length 64 When trying to ping google.com I see: 11:30:52.847738 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 12903, seq 1, length 64 11:30:53.854716 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 12903, seq 2, length 64 11:30:54.862632 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 12903, seq 3, length 64 11:30:55.870632 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 12903, seq 4, length 64 11:30:56.878594 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 12903, seq 5, length 64 But at the same time I can ping google.com from the host! After running service lxd stop service lxd-bridge stop service lxd start on the host, everything works again. Here the same "tcpdump -i lxdbr0" as above: 12:11:44.317375 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 13076, seq 1, length 64 12:11:44.317477 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 13076, seq 1, length 64 12:11:45.316620 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 13076, seq 2, length 64 12:11:45.316680 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 13076, seq 2, length 64 12:11:46.316587 IP 10.0.8.54 > 10.0.8.1: ICMP echo request, id 13076, seq 3, length 64 12:11:46.316645 IP 10.0.8.1 > 10.0.8.54: ICMP echo reply, id 13076, seq 3, length 64 12:11:55.044655 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 13077, seq 1, length 64 12:11:55.045254 IP google-public-dns-a.google.com > 10.0.8.54: ICMP echo reply, id 13077, seq 1, length 64 12:11:56.044626 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 13077, seq 2, length 64 12:11:56.045285 IP google-public-dns-a.google.com > 10.0.8.54: ICMP echo reply, id 13077, seq 2, length 64 12:11:57.044617 IP 10.0.8.54 > google-public-dns-a.google.com: ICMP echo request, id 13077, seq 3, length 64 12:11:57.045264 IP google-public-dns-a.google.com > 10.0.8.54: ICMP echo reply, id 13077, seq 3, length 64 12:12:15.553335 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 13078, seq 1, length 64 12:12:15.554093 IP zrh04s08-in-f14.1e100.net > 10.0.8.54: ICMP echo reply, id 13078, seq 1, length 64 12:12:16.554574 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 13078, seq 2, length 64 12:12:16.555275 IP zrh04s08-in-f14.1e100.net > 10.0.8.54: ICMP echo reply, id 13078, seq 2, length 64 12:12:17.553578 IP 10.0.8.54 > zrh04s08-in-f14.1e100.net: ICMP echo request, id 13078, seq 3, length 64 12:12:17.554337 IP zrh04s08-in-f14.1e100.net > 10.0.8.54: ICMP echo reply, id 13078, seq 3, length 64 The ARP requests I have removed, because they are the same in both cases. What could be happening to LXD after it's been running for a while?? Thanks -----"lxc-users" <[email protected]> wrote: ----- To: LXC users mailing-list <[email protected]> From: "Fajar A. Nugraha" Sent by: "lxc-users" Date: 05/30/2016 7:14 Subject: Re: [lxc-users] LXD containers lose outbound network On Sun, May 29, 2016 at 1:30 PM, <[email protected]> wrote: Hi My LXD has the following network configuration: root@qumind:~# egrep -v '(^#|^$)' /etc/default/lxd-bridge USE_LXD_BRIDGE="true" LXD_BRIDGE="lxdbr0" UPDATE_PROFILE="true" LXD_CONFILE="" LXD_DOMAIN="lxd" LXD_IPV4_ADDR="10.0.8.1" LXD_IPV4_NETMASK="255.255.255.0" LXD_IPV4_NETWORK="10.0.8.0/24" LXD_IPV4_DHCP_RANGE="10.0.8.2,10.0.8.254" LXD_IPV4_DHCP_MAX="253" LXD_IPV4_NAT="true" LXD_IPV6_ADDR="" LXD_IPV6_MASK="" LXD_IPV6_NETWORK="" LXD_IPV6_NAT="false" LXD_IPV6_PROXY="false" And the network works fine at first. However, after some time outbound connections fail, while inbound connections continue working. It affects all LXD containers. What do you mean "outbound" and "inbound"? >From that setup, you have a NAT network. So others servers in your LAN >shouldn't be able to access your containers, unless you also setup port >forwarding (which you didn't mention). So "inbound" can't mean "other servers >in your LAN accessing your container" in your case. If by "inbound" you mean "even the host can't access the container", then something is definitely wrong. I'd start by using simple "ping" test when that happens, coupled with "tcpdump" on both the host (lxdbr0 and veth*) and container (eth0) side. � And it is not enough to just run root@qumind:~# service lxd-bridge stop Job for lxd-bridge.service canceled. root@qumind:~# service lxd restart while the containers are running. The behaviour stays the same. Obviously. You can't delete a bridge that has interfaces attached (which is the case when containers are running) �I have to stop the containers first, then restart the LXD bridge and start the containers again. Only then the outbound connections work again - until I have to restart it all again. What could be the culprit? Start with the basics: - test host <-> container networking first. Use "ping" and "tcpdump" to help - look for error/weird messages at syslog, e.g. "iptables" or "conntrack" �Thanks PS: To stop all running containers I am using for i in $(lxc list | grep RUNNING | awk -F'|' '{print $2}' | tr -d [:blank:]); do lxc stop $i; done I think it would be convenient to be able to just say lxc stop all "service lxd stop" would stop all running containers before stopping lxd. And "service lxd start" after that will start containers that were previously started, as well as containers with�boot.autostart: "true" --� Fajar _______________________________________________ lxc-users mailing list [email protected] http://lists.linuxcontainers.org/listinfo/lxc-users
_______________________________________________ lxc-users mailing list [email protected] http://lists.linuxcontainers.org/listinfo/lxc-users
