Hello,

On a Suse real time kernel (2.6.16.46-0.12-SLERT-10-15), we get the following kernel stack trace while running SDP traffic:

scheduling while atomic: ib_cm/4/0x00000001/18293

Call Trace:
      <ffffffff80324eed>{__sched_text_start+125}
      <ffffffff801375ca>{lock_timer_base+27}
      <ffffffff80327a03>{_spin_unlock_irqrestore+53}
      <ffffffff80137dc3>{__mod_timer+439}
      <ffffffff80326870>{schedule_timeout+208}
      <ffffffff80137fe5>{process_timeout+0}
      <ffffffff80327a43>{_spin_unlock_irq+52}
      <ffffffff80326232>{wait_for_completion_timeout+127}
      <ffffffff80126fd4>{default_wake_function+0}
      <ffffffff88515af6>{:mlx4_core:__mlx4_cmd+318}
      <ffffffff8851c148>{:mlx4_core:mlx4_mr_free+73}
      <ffffffff8852f0b8>{:mlx4_ib:mlx4_ib_dereg_mr+23}
      <ffffffff884cfb9b>{:ib_core:ib_dereg_mr+26}
      <ffffffff88633592>{:ib_sdp:sdp_destroy_qp+161}
      <ffffffff88633c6d>{:ib_sdp:sdp_reset_sk+276}
      <ffffffff88637f43>{:ib_sdp:sdp_cma_handler+2008}
      <ffffffff885f9224>{:ib_cm:cm_work_handler+0}
      <ffffffff886284f4>{:rdma_cm:cma_modify_qp_err+72}
      <ffffffff80125876>{__wake_up_common+62}
      <ffffffff80327a03>{_spin_unlock_irqrestore+53}
      <ffffffff885f9224>{:ib_cm:cm_work_handler+0}
      <ffffffff88629c25>{:rdma_cm:cma_ib_handler+369}
      <ffffffff885f7e52>{:ib_cm:cm_process_work+26}
      <ffffffff885f95fe>{:ib_cm:cm_work_handler+986}
      <ffffffff885f9224>{:ib_cm:cm_work_handler+0}
      <ffffffff8013f91e>{run_workqueue+154}
      <ffffffff80324e76>{__sched_text_start+6}
      <ffffffff8013ffb2>{worker_thread+0}
      <ffffffff8014162a>{keventd_create_kthread+0}
      <ffffffff801400ae>{worker_thread+252}
      <ffffffff80126fd4>{default_wake_function+0}
      <ffffffff8014162a>{keventd_create_kthread+0}
      <ffffffff8014190a>{kthread+212}
      <ffffffff8015865c>{hracct_exit_syscall+22}
      <ffffffff8010bd5e>{child_rip+8}
      <ffffffff8014162a>{keventd_create_kthread+0}
      <ffffffff80141836>{kthread+0}
      <ffffffff8010bd56>{child_rip+0}

The OFA kernel package in place is:

git://git.openfabrics.org/ofed_1_4/linux-2.6.git ofed_kernel
commit 88ab7955605c5e769e760f6bec980e0c2e72aa5c

Looking for the "scheduling while atomic" message in the latest kernel, we see that it was printed out by __schedule_bug in this function:

/*
* Various schedule()-time debugging checks and statistics:
*/
static inline void schedule_debug(struct task_struct *prev)
{
        /*
         * Test if we are atomic. Since do_exit() needs to call into
         * schedule() atomically, we ignore that path for now.
         * Otherwise, whine if we are scheduling when we should not be.
         */
        if (unlikely(in_atomic_preempt_off() && !prev->exit_state))
                __schedule_bug(prev);

        profile_hit(SCHED_PROFILING, __builtin_return_address(0));

        schedstat_inc(this_rq(), sched_count);
#ifdef CONFIG_SCHEDSTATS
        if (unlikely(prev->lock_depth >= 0)) {
                schedstat_inc(this_rq(), bkl_count);
                schedstat_inc(prev, sched_info.bkl_count);
        }
#endif
}

Any idea as to what is going wrong here ?

Thanks for your help,

Vincent




_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to