On Tue, Nov 12, 2013 at 8:29 AM, Wang, Xiaoming <xiaoming.w...@intel.com> wrote:
> cfs_rq get its group run queue but the value of
> cfs_rq->nr_running maybe zero, which will cause
> the panic in pick_next_task_fair.
> So the evaluated of cfs_rq->nr_running is needed.
>
> [15729.985797] BUG: unable to handle kernel NULL pointer dereference at 
> 00000008
> [15729.993838] IP: [<c15266f1>] rb_next+0x1/0x50
> [15729.998745] *pdpt = 000000002861a001 *pde = 0000000000000000
> [15730.005221] Oops: 0000 [#1] PREEMPT SMP
> [15730.009677] Modules linked in: atomisp_css2400b0_v2 lm3554 ov2722 imx1x5 
> atmel_mxt_ts
> vxd392 videobuf_vmalloc videobuf_core lm_dump(O) bcm_bt_lpm hdmi_audio 
> bcm4334x(O) kct_daemon(O)
> [15730.028159] CPU: 1 PID: 2510 Comm: mts Tainted: G W O 
> 3.10.16-261326-g88236a2 #1
> [15730.037215] task: e86ff080 ti: e83ac000 task.ti: e83ac000
> [15730.043261] EIP: 0060:[<c15266f1>] EFLAGS: 00010046 CPU: 1
> [15730.049402] EIP is at rb_next+0x1/0x50
> [15730.053602] EAX: 00000008 EBX: f3655950 ECX: 004c090e EDX: 00000000
> [15730.060607] ESI: 00000000 EDI: 00000000 EBP: e83ada44 ESP: e83ada28
> [15730.067623] DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
> [15730.073668] CR0: 80050033 CR2: 00000008 CR3: 28095000 CR4: 001007f0
> [15730.080684] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [15730.087699] DR6: ffff0ff0 DR7: 00000400
> [15730.091994] Stack:
> [15730.094251] e83ada44 c12719f0 004c090e f3655900 e86ff334 f3655900 00000002 
> e83adacc
> [15730.103086] c1ae384f f3655900 0000254c b7581800 6be38330 0000004e 00000e4e 
> c20d6900
> [15730.111922] f3655950 c20d6900 f3655900 e86ff080 f1d40600 cfcfa794 e83ada90 
> e83ada8c
> [15730.120754] Call Trace:
> [15730.123502] [<c12719f0>] ? pick_next_task_fair+0xf0/0x130
> [15730.129647] [<c1ae384f>] __schedule+0x11f/0x800
> [15730.134821] [<c12c7421>] ? tracer_tracing_is_on+0x11/0x30
> [15730.140964] [<c12c74ad>] ? tracing_is_on+0xd/0x10
> [15730.146331] [<c1ae8285>] ? sub_preempt_count+0x55/0xe0
> [15730.152185] [<c1266394>] ? finish_task_switch+0x54/0xb0
> [15730.158136] [<c1ae3fa3>] schedule+0x23/0x60
> [15730.162920] [<c1ae16e5>] schedule_timeout+0x165/0x280
> [15730.168676] [<c1ae8285>] ? sub_preempt_count+0x55/0xe0
> [15730.174529] [<c1ae350f>] wait_for_completion+0x6f/0xc0
> [15730.180382] [<c126b3e0>] ? try_to_wake_up+0x250/0x250
> [15730.186139] [<c1255658>] flush_work+0xa8/0x110
> [15730.191214] [<c1253fc0>] ? worker_pool_assign_id+0x40/0x40
> [15730.197457] [<c15c3955>] tty_flush_to_ldisc+0x25/0x30
> [15730.203212] [<c15bde18>] n_tty_poll+0x68/0x180
> [15730.208288] [<c15bddb0>] ? process_echoes+0x2c0/0x2c0
> [15730.214044] [<c15bb2fb>] tty_poll+0x6b/0x90
> [15730.218828] [<c15bddb0>] ? process_echoes+0x2c0/0x2c0
> [15730.224584] [<c1339862>] do_sys_poll+0x202/0x440
> [15730.229856] [<c1ae8285>] ? sub_preempt_count+0x55/0xe0
> [15730.235710] [<c13234a1>] ? kmem_cache_free+0x71/0x180
> [15730.241466] [<c13d2bfa>] ? jbd2_journal_stop+0x25a/0x370
> [15730.247513] [<c13d2bfa>] ? jbd2_journal_stop+0x25a/0x370
> [15730.253561] [<c13bb2df>] ? __ext4_journal_stop+0x5f/0x90
> [15730.259608] [<c139787d>] ? ext4_dirty_inode+0x4d/0x60
> [15730.265364] [<c1ae8285>] ? sub_preempt_count+0x55/0xe0
> [15730.271218] [<c13549ac>] ? generic_write_end+0xac/0x100
> [15730.277168] [<c13bb2df>] ? __ext4_journal_stop+0x5f/0x90
> [15730.283216] [<c1338780>] ? __pollwait+0xd0/0xd0
> [15730.288388] [<c1338780>] ? __pollwait+0xd0/0xd0
> [15730.293561] [<c1338780>] ? __pollwait+0xd0/0xd0
> [15730.298734] [<c1338780>] ? __pollwait+0xd0/0xd0
> [15730.303908] [<c12ecd85>] ? __generic_file_aio_write+0x245/0x470
> [15730.310635] [<c12ed059>] ? generic_file_aio_write+0xa9/0xd0
> [15730.316975] [<c138c910>] ? ext4_file_write+0xc0/0x460
> [15730.322730] [<c1ae8285>] ? sub_preempt_count+0x55/0xe0
> [15730.328583] [<c125d09f>] ? remove_wait_queue+0x3f/0x50
> [15730.334436] [<c1ae8285>] ? sub_preempt_count+0x55/0xe0
> [15730.340289] [<c12615e6>] ? __srcu_read_lock+0x66/0x90
> [15730.346045] [<c1ae8285>] ? sub_preempt_count+0x55/0xe0
> [15730.351899] [<c15407f6>] ? __percpu_counter_add+0x96/0xe0
> [15730.358043] [<c1329df1>] ? __sb_end_write+0x31/0x70
> [15730.363603] [<c13285c5>] ? vfs_write+0x165/0x1c0
> [15730.368874] [<c1339b4a>] SyS_poll+0x5a/0xd0
> [15730.373658] [<c1ae52a8>] syscall_call+0x7/0xb
> [15730.378639] [<c1ae0000>] ? add_sysfs_fw_map_entry+0x2f/0x85
>
> Signed-off-by: xiaoming wang <xiaoming.w...@intel.com>
> Signed-off-by: Zhang Dongxing <dongxing.zh...@intel.com>
> ---
>  kernel/sched/fair.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7c70201..2d440f9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3708,7 +3708,7 @@ static struct task_struct *pick_next_task_fair(struct 
> rq *rq)
>                 se = pick_next_entity(cfs_rq);
>                 set_next_entity(cfs_rq, se);
>                 cfs_rq = group_cfs_rq(se);
> -       } while (cfs_rq);
> +       } while (cfs_rq && cfs_rq->nr_running);
>
>         p = task_of(se);
>         if (hrtick_enabled(rq))

This can only happen when something else has already corrupted the
rb-tree.  Breaking out here is going to cause you to instead try
treating a group entity as a task, which will crash just as badly.

Could you describe what was being run when this crash occurred?

> --
> 1.7.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to