On 12/08/2015 08:00 AM, Serge Hallyn wrote:
Ok, can you change the launch command in the scripts to

lxc-start -n $containername -L /tmp/$containername.cout -l trace -o 
/tmp/$containername.dout -- /sbin/init log_target=console log_level=debug

The console output will go into the .cout file and lxc debug output into .dout.

I've actually made some progress in reproducing this outside of my framework. I originally thought the problem only occurred during the first boot of the containers. I've discovered that it can happen any time the server is rebooted and the containers are started when the server comes up. I've only seen this problem when multiple containers are starting at the same time.

I incorporated your modified start command into a test as follows:

# for vm in `lxc-ls`; do lxc-start -n $vm -L /tmp/$vm.cout -l trace -o /tmp/$vm.dout -- /sbin/init log_target=console log_level=debug; done

This starts all of my previously created containers at roughly the same time, and when I do this some of the containers encounter the systemd errors I've been seeing. Which containers hit these errors vary from test to test. In looking at the .dout logs, I noticed the following:

lxc-start 1449591253.647 DEBUG lxc_conf - conf.c:setup_rootfs:1295 - mounted '/hf/cs/vm-00/rootfs' on '/usr/lib64/lxc/rootfs' lxc-start 1449591253.647 INFO lxc_conf - conf.c:setup_utsname:928 - 'pws-vm-00' hostname has been setup lxc-start 1449591253.660 DEBUG lxc_conf - conf.c:setup_hw_addr:2368 - mac address 'fe:d6:e8:96:7e:2d' on 'eth0' has been setup lxc-start 1449591253.660 DEBUG lxc_conf - conf.c:setup_netdev:2595 - 'eth0' has been setup lxc-start 1449591253.660 INFO lxc_conf - conf.c:setup_network:2616 - network has been setup lxc-start 1449591253.660 INFO lxc_conf - conf.c:mount_autodev:1157 - Mounting container /dev lxc-start 1449591253.661 INFO lxc_conf - conf.c:mount_autodev:1179 - Mounted tmpfs onto /usr/lib64/lxc/rootfs/dev lxc-start 1449591253.661 INFO lxc_conf - conf.c:mount_autodev:1197 - Mounted container /dev lxc-start 1449591253.661 ERROR lxc_utils - utils.c:open_without_symlink:1626 - No such file or directory - Error examining fuse in /usr/lib64/lxc/rootfs/sys/fs/fuse/connections lxc-start 1449591253.661 INFO lxc_conf - conf.c:mount_entry:1727 - failed to mount '/sys/fs/fuse/connections' on '/usr/lib64/lxc/rootfs/sys/fs/fuse/connections' (optional): No such file or directory

All of the containers report this error, but what caught my eye is the mount point referenced in this error. Is this same mount point used for all containers that are being started? I assume this error is misleading but I tried changing my for loop to add a 1 second delay in starting each container, and after doing this there were no systemd errors. Unfortunately, adding a delay in my install framework had no effect, so I suspect the apparent positive results in adding a delay to the for loop was just luck.

This same container reported the following errors in its /var/log/messages:

Dec  8 08:06:39 pws-vm-00 systemd: Starting Dump dmesg to /var/log/dmesg...
Dec 8 08:06:39 pws-vm-00 systemd: Failed at step CGROUP spawning /etc/rc.d/init.d/jexec: No such file or directory
Dec  8 08:06:39 pws-vm-00 systemd: Starting Permit User Sessions...
Dec  8 08:06:39 pws-vm-00 systemd: Starting LSB: Bring up/down networking...
Dec 8 08:06:39 pws-vm-00 systemd: jexec.service: control process exited, code=exited status=219 Dec 8 08:06:39 pws-vm-00 systemd: Failed to start LSB: Supports the direct execution of binary formats..
Dec  8 08:06:39 pws-vm-00 systemd: Unit jexec.service entered failed state.

The .cout file for this same container didn't really have any thing of note, other than it also reported some of these errors:

Starting LSB: Bring up/down networking...
Starting D-Bus System Message Bus...
OK  Started D-Bus System Message Bus.
FAILED Failed to start LSB: Supports the direct execution of binary formats..
See 'systemctl status jexec.service' for details.

There was nothing related to this error in the .dout file. The start of this file does have some warnings:

lxc-start 1449590798.820 INFO lxc_start_ui - lxc_start.c:main:264 - using rcfile /var/lib/lxc/vm-00/config lxc-start 1449590798.822 WARN lxc_confile - confile.c:config_pivotdir:1801 - lxc.pivotdir is ignored. It will soon become an error. lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup cpuset unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup cpu unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup blkio unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup memory unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup devices unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup freezer unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup net_cls unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup perf_event unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 WARN lxc_cgfs - cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup hugetlb unknown to /var/lib/lxc vm-00 lxc-start 1449590798.823 INFO lxc_start - start.c:lxc_check_inherited:226 - closed inherited fd 4 lxc-start 1449590798.825 INFO lxc_container - lxccontainer.c:do_lxcapi_start:712 - Attempting to set proc title to [lxc monitor] /var/lib/lxc vm-00 lxc-start 1449590798.825 ERROR lxc_utils - utils.c:setproctitle:1461 - Invalid argument - setting cmdline failed

and there were some later on related to secomp:

lxc-start 1449590798.825 INFO lxc_seccomp - seccomp.c:parse_config_v2:426 - Adding native rule for finit_module action 327681 lxc-start 1449590798.825 WARN lxc_seccomp - seccomp.c:do_resolve_add_rule:233 - Seccomp: got negative # for syscall: finit_module lxc-start 1449590798.825 WARN lxc_seccomp - seccomp.c:do_resolve_add_rule:234 - This syscall will NOT be blacklisted lxc-start 1449590798.825 INFO lxc_seccomp - seccomp.c:parse_config_v2:429 - Adding compat rule for finit_module action 327681 lxc-start 1449590798.825 WARN lxc_seccomp - seccomp.c:do_resolve_add_rule:233 - Seccomp: got negative # for syscall: finit_module lxc-start 1449590798.825 WARN lxc_seccomp - seccomp.c:do_resolve_add_rule:234 - This syscall will NOT be blacklisted lxc-start 1449590798.825 INFO lxc_seccomp - seccomp.c:parse_config_v2:324 - processing: .delete_module errno 1.

along with a few other unrelated warnings. There are no errors in any of the logs that point to an obvious cause for these system errors, at least not to my eyes. Plus, I get a completely different set of failed VMs each time I run through an install, with different services being impacted each time. Is there anything in particular I should look for in the .cout and .dout logs that might help explain what's going on?

_______________________________________________
lxc-users mailing list
[email protected]
http://lists.linuxcontainers.org/listinfo/lxc-users

Reply via email to