Cc'ing Lai, Gu and Kamezawa as they've been working in the area for a
while now.  Gu, is this related to what you've been working on?

Thanks.

On Fri, May 08, 2015 at 07:16:40PM +0800, Song Xiumiao wrote:
> From: songxiumiao <[email protected]>
> 
> By analysing the bug function call trace,we find that create_worker
> function will alloc the memory from node0.Because node0 is offline,
> the allocation is failed. Then we add a condition to ensure the node
> is online and system can alloc memory from a node that is online.
> 
> Follow is the bug information:
> [root@localhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online
> [  225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40
> [18446744029.482996] kvm: enabling virtualization on CPU90
> [  225.725503] TSC synchronization [CPU#43 -> CPU#90]:
> [  225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning 
> off TSC clock.
> [  225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed
> [  225.755126] BUG: unable to handle kernel paging request at 0000000000001b08
> [  225.762931] IP: [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
> [  225.770247] PGD 449bb0067 PUD 46110e067 PMD 0
> [  225.775248] Oops: 0000 [#1] SMP
> [  225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT 
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
> [  225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 
> 4.0.0-rc4-bug-fixed-remove #16
> [  225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, 
> BIOS Brickland.05.04.15.0024 02/28/2015
> [  225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: 
> ffff880446120000
> [  225.896484] RIP: 0010:[<ffffffff81182597>]  [<ffffffff81182597>] 
> __alloc_pages_nodemask+0xb7/0x940
> [  225.906509] RSP: 0018:ffff880446123918  EFLAGS: 00010246
> [  225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 
> 0000000000000000
> [  225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 00000000002052d0
> [  225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 
> 0000000060eca101
> [  225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 
> 000000000001002a
> [  225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 
> 00000000000040d0
> [  225.952306] FS:  00007f9386450740(0000) GS:ffff88046db60000(0000) 
> knlGS:0000000000000000
> [  225.961346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 
> 00000000001407e0
> [  225.975735] Stack:
> [  225.977981]  00000000002052d0 0000000000000000 0000000000000003 
> ffff88045a3d8da0
> [  225.986291]  ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 
> 0000000000000000
> [  225.994597]  000080d000000002 ffff88046d005500 000000000003000f 
> 002052d0002052d0
> [  226.002904] Call Trace:
> [  226.005645]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
> [  226.012557]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
> [  226.019173]  [<ffffffff811d3957>] new_slab+0xa7/0x460
> [  226.024826]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
> [  226.030960]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
> [  226.037679]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
> [  226.043812]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
> [  226.051299]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
> [  226.057236]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
> [  226.063357]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
> [  226.069974]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
> [  226.077076]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
> [  226.084468]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
> [  226.091084]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
> [  226.098189]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
> [  226.103929]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
> [  226.109574]  [<ffffffff81077919>] cpu_up+0x89/0xb0
> [  226.114923]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
> [  226.121350]  [<ffffffff814386dd>] device_online+0x6d/0xa0
> [  226.127382]  [<ffffffff814387a5>] online_store+0x95/0xa0
> [  226.133322]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
> [  226.139457]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
> [  226.145586]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
> [  226.152109]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
> [  226.157853]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
> [  226.164954]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
> [  226.170595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
> [  226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 83 
> e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 <48> 
> 83 78 08 00 0f 84 51 01 00 00 b8 01
> [  226.199175] RIP  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
> [  226.206576]  RSP <ffff880446123918>
> [  226.210471] CR2: 0000000000001b08
> [  226.227939] ---[ end trace 30d753e1e1124696 ]---
> [  226.412591] Kernel panic - not syncing: Fatal exception
> [  226.430948] Kernel Offset: disabled
> [  226.434845] drm_kms_helper: panic occurred, switching back to text console
> [  226.618325] ---[ end Kernel panic - not syncing: Fatal exception
> [  226.625047] ------------[ cut here ]------------
> [  226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 
> native_smp_send_reschedule+0x5d/0x60()
> [  226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT 
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
> [  226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G      D         
> 4.0.0-rc4-bug-fixed-remove #16
> [  226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, 
> BIOS Brickland.05.04.15.0024 02/28/2015
> [  226.751558]  0000000000000000 00000000aa535e80 ffff88046db63d58 
> ffffffff8167aa08
> [  226.759865]  0000000000000000 0000000000000000 ffff88046db63d98 
> ffffffff810772da
> [  226.768173]  ffff88046db63d98 0000000000000000 ffff88046d615380 
> 000000000000002b
> [  226.776480] Call Trace:
> [  226.779212]  <IRQ>  [<ffffffff8167aa08>] dump_stack+0x45/0x57
> [  226.785657]  [<ffffffff810772da>] warn_slowpath_common+0x8a/0xc0
> [  226.792367]  [<ffffffff8107740a>] warn_slowpath_null+0x1a/0x20
> [  226.798886]  [<ffffffff8104a64d>] native_smp_send_reschedule+0x5d/0x60
> [  226.806182]  [<ffffffff810b4fe5>] trigger_load_balance+0x145/0x1b0
> [  226.813093]  [<ffffffff810a348c>] scheduler_tick+0x9c/0xe0
> [  226.819228]  [<ffffffff810e0a21>] update_process_times+0x51/0x60
> [  226.825946]  [<ffffffff810f0925>] tick_sched_handle.isra.18+0x25/0x60
> [  226.833143]  [<ffffffff810f09a4>] tick_sched_timer+0x44/0x80
> [  226.839467]  [<ffffffff810e1737>] __run_hrtimer+0x77/0x1d0
> [  226.845590]  [<ffffffff810f0960>] ? tick_sched_handle.isra.18+0x60/0x60
> [  226.852980]  [<ffffffff810e1b13>] hrtimer_interrupt+0x103/0x230
> [  226.859596]  [<ffffffff8104d3d9>] local_apic_timer_interrupt+0x39/0x60
> [  226.866883]  [<ffffffff81684d85>] smp_apic_timer_interrupt+0x45/0x60
> [  226.873982]  [<ffffffff81682ded>] apic_timer_interrupt+0x6d/0x80
> [  226.880690]  <EOI>  [<ffffffff81675abe>] ? panic+0x1c3/0x204
> [  226.887036]  [<ffffffff81675ab7>] ? panic+0x1bc/0x204
> [  226.892682]  [<ffffffff81018949>] oops_end+0x109/0x120
> [  226.898422]  [<ffffffff81675285>] no_context+0x2ee/0x366
> [  226.904359]  [<ffffffff81675370>] __bad_area_nosemaphore+0x73/0x1cc
> [  226.911361]  [<ffffffff816756ae>] bad_area+0x44/0x4c
> [  226.916910]  [<ffffffff81062b1a>] __do_page_fault+0x2ea/0x420
> [  226.923331]  [<ffffffff81062c81>] do_page_fault+0x31/0x70
> [  226.929364]  [<ffffffff81683f08>] page_fault+0x28/0x30
> [  226.935106]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
> [  226.941235]  [<ffffffff81182597>] ? __alloc_pages_nodemask+0xb7/0x940
> [  226.948430]  [<ffffffff81182705>] ? __alloc_pages_nodemask+0x225/0x940
> [  226.955725]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
> [  226.962624]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
> [  226.969239]  [<ffffffff811d3957>] new_slab+0xa7/0x460
> [  226.974885]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
> [  226.981015]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
> [  226.987727]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
> [  226.993851]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
> [  227.001340]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
> [  227.007275]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
> [  227.013404]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
> [  227.020019]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
> [  227.027112]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
> [  227.034502]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
> [  227.041117]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
> [  227.048219]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
> [  227.053961]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
> [  227.059597]  [<ffffffff81077919>] cpu_up+0x89/0xb0
> [  227.064950]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
> [  227.071372]  [<ffffffff814386dd>] device_online+0x6d/0xa0
> [  227.077395]  [<ffffffff814387a5>] online_store+0x95/0xa0
> [  227.083332]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
> [  227.089460]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
> [  227.095589]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
> [  227.102108]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
> [  227.107850]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
> [  227.114950]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
> [  227.120595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
> [  227.127306] ---[ end trace 30d753e1e1124697 ]---
> 
> Signed-off-by: Song Xiumiao <[email protected]>
> Signed-off-by: Gong Zhaogang <[email protected]>
> Tested-by: Liu Changsheng <[email protected]>
> Reviewed-by: xiaofeng.yan <[email protected]>
> Reviewed-by: Fan Dongdong <[email protected]>
> ---
>  kernel/workqueue.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 586ad91..cae6277 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const 
> struct workqueue_attrs *attrs)
>       if (wq_numa_enabled) {
>               for_each_node(node) {
>                       if (cpumask_subset(pool->attrs->cpumask,
> -                                        wq_numa_possible_cpumask[node])) {
> +                                        wq_numa_possible_cpumask[node]) &&
> +                                        node_online(node)) {
>                               pool->node = node;
>                               break;
>                       }
> -- 
> 1.9.1
> 
> 

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to