On Thu, 2014-05-08 at 14:24 -0400, CDR wrote: > The trace is showing it. Look for the word lxc-start
When you do that. Point it out. Highlight it in the message. You see it but there's a lot of noise in the tracedump below. I couldn't spot it and had to save the message as an mbox dump and do a search on it. > On Thu, May 8, 2014 at 2:09 PM, Tamas Papp <[email protected]> wrote: > > > > On 05/08/2014 08:06 PM, CDR wrote: > >> Ubuntu server blows up with LXC, and I am using the very latest kernel, > >> 3.14.2 > >> > >> > >> [ 3798.345926] WARNING: CPU: 11 PID: 6963 at > >> /home/apw/COD/linux/fs/sysfs/dir.c:52 sysfs_warn_dup+0x91/0xb0() > >> [ 3798.345928] sysfs: cannot create duplicate filename > >> '/devices/pci0000:00/0000:00:05.0/0000:02:00.1/net/eth1/upper_eth0' > >> [ 3798.345930] Modules linked in: macvlan veth xt_conntrack ipt_REJECT > >> ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle > >> ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 > >> nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc > >> iptable_filter ip_tables x_tables dm_crypt gpio_ich dcdbas > >> intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul > >> ghash_clmulni_intel aesni_intel aes_x86_64 bnep rfcomm lrw gf128mul > >> psmouse glue_helper bluetooth ablk_helper cryptd 6lowpan_iphc > >> serio_raw joydev ipmi_si i7core_edac acpi_power_meter edac_core > >> lpc_ich mac_hid parport_pc ppdev lp parport ses enclosure hid_generic > >> usbhid hid usb_storage bnx2 megaraid_sas wmi > >> [ 3798.345989] CPU: 11 PID: 6963 Comm: lxc-start Not tainted > >> 3.14.2-031402-generic #201404262053 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All that's saying is that this was the command in progress in user space at the time of the kernel fault. > >> [ 3798.345991] Hardware name: Dell Inc. PowerEdge R910/0KYD3D, BIOS > >> 2.10.0 08/29/2013 > >> [ 3798.345993] 0000000000000034 ffff885fc4c0b5f8 ffffffff8175505c > >> 0000000000000007 > >> [ 3798.346002] ffff885fc4c0b648 ffff885fc4c0b638 ffffffff8106cb5c > >> ffff883fd02c44b0 > >> [ 3798.346008] ffff887fd2b03000 ffff887fd2b03000 ffff883fd02c44b0 > >> 0000000000000001 > >> [ 3798.346014] Call Trace: > >> [ 3798.346025] [<ffffffff8175505c>] dump_stack+0x46/0x58 > >> [ 3798.346033] [<ffffffff8106cb5c>] warn_slowpath_common+0x8c/0xc0 > >> [ 3798.346037] [<ffffffff8106cc46>] warn_slowpath_fmt+0x46/0x50 > >> [ 3798.346044] [<ffffffff8137c750>] ? strlcat+0x60/0x80 > >> [ 3798.346047] [<ffffffff81245d41>] sysfs_warn_dup+0x91/0xb0 > >> [ 3798.346051] [<ffffffff812460c0>] > >> sysfs_do_create_link_sd.isra.2+0xd0/0xe0 > >> [ 3798.346054] [<ffffffff812460f5>] sysfs_create_link+0x25/0x50 > >> [ 3798.346060] [<ffffffff816497b8>] netdev_adjacent_sysfs_add+0x58/0x70 > >> [ 3798.346068] [<ffffffff816502d4>] netdev_adjacent_rename_links+0xa4/0xc0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The above 5 lines are not giving me a warm and fuzzy. What is that container doing with sysfs? > >> [ 3798.346071] [<ffffffff816503c3>] dev_change_name+0xd3/0x240 > >> [ 3798.346078] [<ffffffff8165e24b>] do_setlink+0x72b/0x790 > >> [ 3798.346082] [<ffffffff8165fffc>] rtnl_newlink+0x48c/0x6a0 > >> [ 3798.346085] [<ffffffff8165fc61>] ? rtnl_newlink+0xf1/0x6a0 > >> [ 3798.346093] [<ffffffff811694a3>] ? get_page_from_freelist+0x443/0x630 > >> [ 3798.346099] [<ffffffff8116db00>] ? __pagevec_lru_add_fn+0x220/0x220 > >> [ 3798.346102] [<ffffffff8165fa85>] rtnetlink_rcv_msg+0x165/0x250 > >> [ 3798.346108] [<ffffffff8163d337>] ? __alloc_skb+0x87/0x2a0 > >> [ 3798.346112] [<ffffffff8165f920>] ? __rtnl_unlock+0x20/0x20 > >> [ 3798.346120] [<ffffffff8167ce59>] netlink_rcv_skb+0xa9/0xd0 > >> [ 3798.346123] [<ffffffff8165cba5>] rtnetlink_rcv+0x25/0x40 > >> [ 3798.346127] [<ffffffff8167c3b8>] netlink_unicast+0x128/0x1d0 > >> [ 3798.346130] [<ffffffff8167caf4>] netlink_sendmsg+0x364/0x440 > >> [ 3798.346138] [<ffffffff8163478f>] sock_sendmsg+0xaf/0xc0 > >> [ 3798.346146] [<ffffffff81188cb9>] ? __do_fault+0x409/0x500 > >> [ 3798.346150] [<ffffffff81634e9c>] ___sys_sendmsg+0x3ac/0x3c0 > >> [ 3798.346155] [<ffffffff8118d173>] ? handle_mm_fault+0xb3/0x160 > >> [ 3798.346160] [<ffffffff8176619c>] ? __do_page_fault+0x28c/0x550 > >> [ 3798.346165] [<ffffffff8111df5c>] ? acct_account_cputime+0x1c/0x20 > >> [ 3798.346171] [<ffffffff810a66b9>] ? account_user_time+0x99/0xb0 > >> [ 3798.346175] [<ffffffff810a6d3d>] ? vtime_account_user+0x5d/0x70 > >> [ 3798.346183] [<ffffffff811ed6f3>] ? __fdget+0x13/0x20 > >> [ 3798.346187] [<ffffffff816358b9>] __sys_sendmsg+0x49/0x90 > >> [ 3798.346190] [<ffffffff81635919>] SyS_sendmsg+0x19/0x20 > >> [ 3798.346197] [<ffffffff8176b6bf>] tracesys+0xe1/0xe6 > >> [ 3798.346199] ---[ end trace 99513b106fc1cfe0 ]--- To my eye, this looks like a sysfs problem (that very well may be container related) deep down in the kernel. It could be deeper. It's passing through some netlink layers. Under no circumstance, should a user space application trigger a fault like this. By definition, it has to be a kernel fault, maybe triggered by lxc-start, though I'm not sure I see how. Even if a user application is doing some thing wrong, it should never be capable of causing a fault like this, so, just because of the fault itself, there's something that's not being handled properly in the kernel and, ergo, you have a kernel problem. > >> On Thu, May 8, 2014 at 12:29 PM, Tamas Papp <[email protected]> wrote: > >> > > > > Why do you think, it's lxc related? > > > > t Regards, Mike -- Michael H. Warfield (AI4NB) | (770) 978-7061 | [email protected] /\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/ NIC whois: MHW9 | An optimist believes we live in the best of all PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
signature.asc
Description: This is a digitally signed message part
_______________________________________________ lxc-users mailing list [email protected] http://lists.linuxcontainers.org/listinfo/lxc-users
