* [email protected] ([email protected]) wrote: > Hey Mathieu: > > Thanks for looking at this. I'm a bit new to debugging at this level, so > you may need to provide me a bit more info on what you need. I attempted > to use "pstack" on the lttctl and lttd tasks ... no luck as pstack also > locked up. > > I put a bit of tracing into liblttctl and discovered the lockup occurs > when a write of "traceName" (whatever traceName happens to be) occurs to > the "/mnt/debugfs/ltt/destroy_trace" file. > > I'm guessing that you would like some tracing of the ltt kernel module. > Is there something that I can turn on, or another way I could get a > stack dump of that module after lockup? I'll do a little research this > weekend on kernel debugging techniques. > > I can certainly sprinkle in some printk statements in the ltt kernel > module source. Doing provided the following info: > > - Control entered _ltt_trace_destroy (single underscore) > - Control entered del_timer_sync(<t_async_wakeup_timer) and never > returned > > Does that help, or should I continue farther down this path?
Can you try the following patch to see if it fixes your problem ? lttng fix rt kernel teardown deadlock LTTng has a teardown bug on RT (deadlock): Deleting a timer (sync) while holding the traces mutex, and the handler takes the same mutex, which leads to a deadlock. Fix this by taking a RCU read lock in the timer (instead of the RT-specific fix using the mutex), and by doing synchronize_rcu() in addition to synchronize_sched() upon updates. Signed-off-by: Mathieu Desnoyers <[email protected]> --- ltt/ltt-tracer.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) Index: linux-2.6-lttng/ltt/ltt-tracer.c =================================================================== --- linux-2.6-lttng.orig/ltt/ltt-tracer.c 2010-05-11 07:50:46.000000000 -0400 +++ linux-2.6-lttng/ltt/ltt-tracer.c 2010-05-11 07:55:46.000000000 -0400 @@ -41,6 +41,14 @@ #include <linux/vmalloc.h> #include <asm/atomic.h> +static void synchronize_trace(void) +{ + synchronize_sched(); +#ifdef CONFIG_PREEMPT_RT + synchronize_rcu(); +#endif +} + static void async_wakeup(unsigned long data); static DEFINE_TIMER(ltt_async_wakeup_timer, async_wakeup, 0, 0); @@ -321,7 +329,7 @@ void ltt_module_unregister(enum ltt_modu ltt_filter_unregister(); ltt_run_filter_owner = NULL; /* Wait for preempt sections to finish */ - synchronize_sched(); + synchronize_trace(); break; case LTT_FUNCTION_FILTER_CONTROL: ltt_filter_control_functor = ltt_filter_control_default; @@ -429,13 +437,13 @@ static void async_wakeup(unsigned long d * PREEMPT_RT does not allow spinlocks to be taken within preempt * disable sections (spinlock taken in wake_up). However, mainline won't * allow mutex to be taken in interrupt context. Ugly. - * A proper way to do this would be to turn the timer into a - * periodically woken up thread, but it adds to the footprint. + * Take a standard RCU read lock for RT kernels, which imply that we + * also have to synchronize_rcu() upon updates. */ #ifndef CONFIG_PREEMPT_RT rcu_read_lock_sched(); #else - ltt_lock_traces(); + rcu_read_lock(); #endif list_for_each_entry_rcu(trace, <t_traces.head, list) { trace_async_wakeup(trace); @@ -443,7 +451,7 @@ static void async_wakeup(unsigned long d #ifndef CONFIG_PREEMPT_RT rcu_read_unlock_sched(); #else - ltt_unlock_traces(); + rcu_read_unlock(); #endif mod_timer(<t_async_wakeup_timer, jiffies + LTT_PERCPU_TIMER_INTERVAL); @@ -901,7 +909,7 @@ int ltt_trace_alloc(const char *trace_na set_kernel_trace_flag_all_tasks(); } list_add_rcu(&trace->list, <t_traces.head); - synchronize_sched(); + synchronize_trace(); ltt_unlock_traces(); @@ -974,7 +982,7 @@ static int _ltt_trace_destroy(struct ltt } /* Everything went fine */ list_del_rcu(&trace->list); - synchronize_sched(); + synchronize_trace(); if (list_empty(<t_traces.head)) { clear_kernel_trace_flag_all_tasks(); /* @@ -1195,7 +1203,7 @@ static int _ltt_trace_stop(struct ltt_tr trace->nr_channels); trace->active = 0; ltt_traces.num_active_traces--; - synchronize_sched(); /* Wait for each tracing to be finished */ + synchronize_trace(); /* Wait for each tracing to be finished */ } module_put(ltt_run_filter_owner); /* Everything went fine */ @@ -1327,12 +1335,12 @@ static void __exit ltt_exit(void) list_for_each_entry_rcu(trace, <t_traces.head, list) _ltt_trace_stop(trace); /* Wait for quiescent state. Readers have preemption disabled. */ - synchronize_sched(); + synchronize_trace(); /* Safe iteration is now permitted. It does not have to be RCU-safe * because no readers are left. */ list_for_each_safe(pos, n, <t_traces.head) { trace = container_of(pos, struct ltt_trace, list); - /* _ltt_trace_destroy does a synchronize_sched() */ + /* _ltt_trace_destroy does a synchronize_trace() */ _ltt_trace_destroy(trace); __ltt_trace_destroy(trace); } > > Thanks > > JP > > -----Original Message----- > From: Mathieu Desnoyers [mailto:[email protected]] > Sent: Thursday, April 22, 2010 12:06 PM > To: John P. Paul > Cc: [email protected] > Subject: Re: [ltt-dev] lttctl locks up with RT Linux > > * [email protected] ([email protected]) wrote: > > Greetings: > > > > I'm using a a 2.6.33.2 kernel. I have LTT up and running on the plain > vanilla kernel, but "lttctl -D trace1" never returns on the RT version > of the same kernel. I've downloaded and integrated the following pieces: > > > > patch-2.6.33.2-lttng-0.211 > > ltt-control-0.84-07042010 > > lttv-0.12.31.04072010 > > > > Note that I've had to manually apply several of the patches from the > patch file. I can provide a list if desired. > > > > After the lockup, I can do an ls on the /tmp/trace directory and see > that the following files have a non-zero length (remaining files in the > trace directory have a zero length): > > > > fs_0, fs_1, kernel_0, kernel_1 > > > > I'm running on an Intel Core2 Duo system. I've built all the LTT > components into the kernel, so I do not have to load any modules at > runtime. I do execute an ltt-armall prior to issuing any "lttctl -C -w > /tmp/trace trace1" commands. > > > > When the above occurs, I usually have to hard power down the machine > as a root issued "reboot" does not reboot the machine (even after trying > to kill the running ltt processes). > > > > Any suggestions on how to get this working under the RT kernel would > be appreciated. Does LTT even function properly for RT kernels? If not, > it would be of great benefit to have it do so. Please let me know if > additional debug info would be helpful. > > I bet there is something fishy on RT with __ltt_trace_destroy(). Having > an output of where the CPU is stalled in lttng code would help. > > > > > > A couple additional notes: > > > > - LTTV docs state that it requires glib 2.4 or greater. I believe this > is incorrect due to the following dependency: > > > > $ rpm -qa glib2 > > glib2-2.12.3-4.el5_3.1 << my default glib (RHEL5.x base) > > > > state.c: In function 'copy_process_state': > > state.c:1344: error: 'GHashTableIter' undeclared (first use in this > function) > > > > I've installed glib-2.22.5 to get around the above issue. > > OK, the dependency seems to be glib 2.16 now. Will update the README > and LTTng manual accordingly. > > Thanks, > > Mathieu > > > > -- > This is an e-mail from General Dynamics Robotic Systems. It is for the > intended recipient only and may contain confidential and privileged > information. No one else may read, print, store, copy, forward or act in > reliance on it or its attachments. If you are not the intended recipient, > please return this message to the sender and delete the message and any > attachments from your computer. Your cooperation is appreciated. > > > _______________________________________________ > ltt-dev mailing list > [email protected] > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
