On 25.10.2013 11:02, Linus Torvalds wrote:
Adding more people, so quoting the whole email for them.We definitely have some module unload issues. Guys, try the following a few times to unload modules: lsmod | grep ' 0 '| cut -d' ' -f1 | xargs sudo rmmod (a few times because unloading one module will then potentially make
I do use a quite monolithic kernel with only a few modules, and one of the machines is pretty stripped down: I was unable to trigger any unusual kernel reaction within 10000 rmmod / modprobe cycles. lsmod ===== Module Size Used by ip6t_REJECT 12489 3 nf_conntrack_ipv6 13453 3 nf_defrag_ipv6 49936 1 nf_conntrack_ipv6 ip6table_raw 12565 1 ipt_REJECT 12485 3 xt_tcpudp 12531 6 xt_pkttype 12456 3 xt_LOG 17205 12 xt_limit 12570 12 iptable_raw 12561 1 xt_CT 12820 4 iptable_filter 12666 1 ip6table_mangle 12579 0 nf_conntrack_netbios_ns 12585 0 nf_conntrack_broadcast 12541 1 nf_conntrack_netbios_ns nf_conntrack_ipv4 13655 3 nf_defrag_ipv4 12649 1 nf_conntrack_ipv4 ip_tables 17713 2 iptable_raw,iptable_filter xt_conntrack 12664 6 nf_conntrack 67920 6 nf_conntrack_ipv6,xt_CT,nf_conntrack_netbios_ns,nf_conntrack_broadcast,nf_conntrack_ipv4,xt_conntrack ip6table_filter 12670 1 ip6_tables 17740 3 ip6table_raw,ip6table_mangle,ip6table_filter x_tables 21937 15 ip6t_REJECT,ip6table_raw,ipt_REJECT,xt_tcpudp,xt_pkttype,xt_LOG,xt_limit,iptable_raw,xt_CT,iptable_filter,ip6table_mangle,ip_tables,xt_conntrack,ip6table_filter,ip6_tables snd_rme96 24387 0 snd_hda_intel 34073 0 snd_hda_codec_realtek 41826 1 snd_hda_codec 129150 2 snd_hda_intel,snd_hda_codec_realtek snd_pcm 73096 3 snd_rme96,snd_hda_intel,snd_hda_codec snd_timer 24441 1 snd_pcm snd_page_alloc 14230 2 snd_hda_intel,snd_pcm snd 58328 6 snd_rme96,snd_hda_intel,snd_hda_codec_realtek,snd_hda_codec,snd_pcm,snd_timer soundcore 14599 1 snd binfmt_misc 13111 1 ipv6 272895 24 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6,ip6table_mangle
other modules unloadable). On my machine, I can trigger this, for example: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3217 at fs/sysfs/file.c:498 sysfs_attr_ns+0x91/0xa0() sysfs: kobject (null) without dirent Modules linked in: fuse nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_$ CPU: 0 PID: 3217 Comm: rmmod Not tainted 3.12.0-rc6-00284-ge6036c0b8896 #19 Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013 0000000000000009 ffff8800aca35df8 ffffffff8160aab5 ffff8800aca35e40 ffff8800aca35e30 ffffffff810514b8 ffffffffa013f080 ffff8801194a6040 0000000000000800 0000000000000000 0000000000c5b3e0 ffff8800aca35e90 Call Trace: [<ffffffff8160aab5>] dump_stack+0x45/0x56 [<ffffffff810514b8>] warn_slowpath_common+0x78/0xa0 [<ffffffff81051527>] warn_slowpath_fmt+0x47/0x50 [<ffffffff810b5960>] ? module_refcount+0xb0/0xb0 [<ffffffff811e5c61>] sysfs_attr_ns+0x91/0xa0 [<ffffffff811e5d2a>] sysfs_remove_file+0x1a/0x50 [<ffffffff814c88a3>] cpufreq_sysfs_remove_file+0x13/0x30 [<ffffffffa013d350>] acpi_cpufreq_exit+0x2e/0xcde [acpi_cpufreq] [<ffffffff810b7d1d>] SyS_delete_module+0x15d/0x2c0 [<ffffffff81002929>] ? do_notify_resume+0x59/0x90 [<ffffffff81618f62>] system_call_fastpath+0x16/0x1b ---[ end trace f887112caaa5c4ab ]--- so at least we have a cpufreq/sysfs interaction bug. There may be others. This particular cpufreq issue may be triggered by the fact that acpi-cpufreq isn't actually in use (pstate is). Or it might be some generic cpufreq/sysfs bug. Rafael, Greg, ideas? I don't see that this particular one would be the one that causes the timer issues, but it's an example of the fact that module unload tends to be special and not necessarily well tested. Linus On Fri, Oct 25, 2013 at 9:38 AM, Linus Torvalds <[email protected]> wrote:Hmm.. I just got a run_timer_softirq oops on my own laptop, slightly different. That was not during shutdown, although there was a "yum upgrade" finishing when that happened, so it's quite likely that there was a service shutdown (and then restart). I think it's related. But my oops has almost no information: the IP that was jumped to was bogus, and the callchain is just CPU idle followed by the softirq -> run_timers_softirq handling, so there's no real way to see *what* triggered it. The bad rip was ffffffffa051e250, which is not a valid code address. It *might* be a module address, though. So this might be triggered by rmmod on some module that doesn't remove all its timers... Ideas? Linus
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

