Hi, Linux version 3.10.17
Problem Statement: The timekeeping/do_timer seems to be stopped and the core (in this case it is core0) which is aborting is stuck in the loop which relies on jiffies. The root cause/Reason: we have tickless kernel, so cpu goes to deep idle state, and stop sched tick. tick_nohz_stop_sched_tick tick_sched_do_timer should then take the job and whichever cpu is running transfer jiffies incrementing job to itself. which is tick_sched_do_timer but when say core0 has raised BUG, ipi_cpu_stop will amek other cpu to go to stop. and clcokevents_notify/tick_notify/hrtimer_notifiy eventually seem to be conencted through cpu_chain. but this code belong to hotplug where cpu_down happen and then it can successfully call tick_handover_do_timer which will take over the duty from dying cpu and assign it to the one which is online. static void tick_handover_do_timer(int *cpup) { if (*cpup == tick_do_timer_cpu) { int cpu = cpumask_first(cpu_online_mask); tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu : TICK_DO_TIMER_NONE; } } but since cpu_down is not getting called, this handover is not happening. and the last status of the variable tick_do_timer_cpu is always pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code relies on the increment of jiffies). what is the right way to approach this problem, at first it looks like kernel should take care of handing over the jiffies job to other online core indepedent of hotplug. Regards, Oza. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/