* [email protected] ([email protected]) wrote:
> Hey Mathieu:
> 
> Thanks for looking at this. I'm a bit new to debugging at this level, so
> you may need to provide me a bit more info on what you need. I attempted
> to use "pstack" on the lttctl and lttd tasks ... no luck as pstack also
> locked up.
> 
> I put a bit of tracing into liblttctl and discovered the lockup occurs
> when a write of "traceName" (whatever traceName happens to be) occurs to
> the "/mnt/debugfs/ltt/destroy_trace" file.
> 
> I'm guessing that you would like some tracing of the ltt kernel module.
> Is there something that I can turn on, or another way I could get a
> stack dump of that module after lockup?  I'll do a little research this
> weekend on kernel debugging techniques.
> 
> I can certainly sprinkle in some printk statements in the ltt kernel
> module source. Doing provided the following info:
> 
> - Control entered _ltt_trace_destroy (single underscore)
> - Control entered del_timer_sync(&ltt_async_wakeup_timer) and never
> returned
> 
> Does that help, or should I continue farther down this path?

Can you try the following patch to see if it fixes your problem ?


lttng fix rt kernel teardown deadlock

LTTng has a teardown bug on RT (deadlock):

Deleting a timer (sync) while holding the traces mutex, and the handler takes
the same mutex, which leads to a deadlock.

Fix this by taking a RCU read lock in the timer (instead of the RT-specific fix
using the mutex), and by doing synchronize_rcu() in addition to
synchronize_sched() upon updates.

Signed-off-by: Mathieu Desnoyers <[email protected]>
---
 ltt/ltt-tracer.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Index: linux-2.6-lttng/ltt/ltt-tracer.c
===================================================================
--- linux-2.6-lttng.orig/ltt/ltt-tracer.c       2010-05-11 07:50:46.000000000 
-0400
+++ linux-2.6-lttng/ltt/ltt-tracer.c    2010-05-11 07:55:46.000000000 -0400
@@ -41,6 +41,14 @@
 #include <linux/vmalloc.h>
 #include <asm/atomic.h>
 
+static void synchronize_trace(void)
+{
+       synchronize_sched();
+#ifdef CONFIG_PREEMPT_RT
+       synchronize_rcu();
+#endif
+}
+
 static void async_wakeup(unsigned long data);
 
 static DEFINE_TIMER(ltt_async_wakeup_timer, async_wakeup, 0, 0);
@@ -321,7 +329,7 @@ void ltt_module_unregister(enum ltt_modu
                ltt_filter_unregister();
                ltt_run_filter_owner = NULL;
                /* Wait for preempt sections to finish */
-               synchronize_sched();
+               synchronize_trace();
                break;
        case LTT_FUNCTION_FILTER_CONTROL:
                ltt_filter_control_functor = ltt_filter_control_default;
@@ -429,13 +437,13 @@ static void async_wakeup(unsigned long d
         * PREEMPT_RT does not allow spinlocks to be taken within preempt
         * disable sections (spinlock taken in wake_up). However, mainline won't
         * allow mutex to be taken in interrupt context. Ugly.
-        * A proper way to do this would be to turn the timer into a
-        * periodically woken up thread, but it adds to the footprint.
+        * Take a standard RCU read lock for RT kernels, which imply that we
+        * also have to synchronize_rcu() upon updates.
         */
 #ifndef CONFIG_PREEMPT_RT
        rcu_read_lock_sched();
 #else
-       ltt_lock_traces();
+       rcu_read_lock();
 #endif
        list_for_each_entry_rcu(trace, &ltt_traces.head, list) {
                trace_async_wakeup(trace);
@@ -443,7 +451,7 @@ static void async_wakeup(unsigned long d
 #ifndef CONFIG_PREEMPT_RT
        rcu_read_unlock_sched();
 #else
-       ltt_unlock_traces();
+       rcu_read_unlock();
 #endif
 
        mod_timer(&ltt_async_wakeup_timer, jiffies + LTT_PERCPU_TIMER_INTERVAL);
@@ -901,7 +909,7 @@ int ltt_trace_alloc(const char *trace_na
                set_kernel_trace_flag_all_tasks();
        }
        list_add_rcu(&trace->list, &ltt_traces.head);
-       synchronize_sched();
+       synchronize_trace();
 
        ltt_unlock_traces();
 
@@ -974,7 +982,7 @@ static int _ltt_trace_destroy(struct ltt
        }
        /* Everything went fine */
        list_del_rcu(&trace->list);
-       synchronize_sched();
+       synchronize_trace();
        if (list_empty(&ltt_traces.head)) {
                clear_kernel_trace_flag_all_tasks();
                /*
@@ -1195,7 +1203,7 @@ static int _ltt_trace_stop(struct ltt_tr
                        trace->nr_channels);
                trace->active = 0;
                ltt_traces.num_active_traces--;
-               synchronize_sched(); /* Wait for each tracing to be finished */
+               synchronize_trace(); /* Wait for each tracing to be finished */
        }
        module_put(ltt_run_filter_owner);
        /* Everything went fine */
@@ -1327,12 +1335,12 @@ static void __exit ltt_exit(void)
        list_for_each_entry_rcu(trace, &ltt_traces.head, list)
                _ltt_trace_stop(trace);
        /* Wait for quiescent state. Readers have preemption disabled. */
-       synchronize_sched();
+       synchronize_trace();
        /* Safe iteration is now permitted. It does not have to be RCU-safe
         * because no readers are left. */
        list_for_each_safe(pos, n, &ltt_traces.head) {
                trace = container_of(pos, struct ltt_trace, list);
-               /* _ltt_trace_destroy does a synchronize_sched() */
+               /* _ltt_trace_destroy does a synchronize_trace() */
                _ltt_trace_destroy(trace);
                __ltt_trace_destroy(trace);
        }


> 
> Thanks
> 
> JP
> 
> -----Original Message-----
> From: Mathieu Desnoyers [mailto:[email protected]] 
> Sent: Thursday, April 22, 2010 12:06 PM
> To: John P. Paul
> Cc: [email protected]
> Subject: Re: [ltt-dev] lttctl locks up with RT Linux
> 
> * [email protected] ([email protected]) wrote:
> > Greetings:
> > 
> > I'm using a a 2.6.33.2 kernel. I have LTT up and running on the plain
> vanilla kernel, but "lttctl -D trace1" never returns on the RT version
> of the same kernel. I've downloaded and integrated the following pieces:
> > 
> > patch-2.6.33.2-lttng-0.211
> > ltt-control-0.84-07042010
> > lttv-0.12.31.04072010
> > 
> > Note that I've had to manually apply several of the patches from the
> patch file. I can provide a list if desired.
> > 
> > After the lockup, I can do an ls on the /tmp/trace directory and see
> that the following files have a non-zero length (remaining files in the
> trace directory have a zero length):
> > 
> > fs_0, fs_1, kernel_0, kernel_1
> > 
> > I'm running on an Intel Core2 Duo system. I've built all the LTT
> components into the kernel, so I do not have to load any modules at
> runtime. I do execute an ltt-armall prior to issuing any "lttctl -C -w
> /tmp/trace trace1" commands.
> > 
> > When the above occurs, I usually have to hard power down the machine
> as a root issued "reboot" does not reboot the machine (even after trying
> to kill the running ltt processes).
> > 
> > Any suggestions on how to get this working under the RT kernel would
> be appreciated. Does LTT even function properly for RT kernels? If not,
> it would be of great benefit to have it do so.  Please let me know if
> additional debug info would be helpful. 
> 
> I bet there is something fishy on RT with __ltt_trace_destroy(). Having
> an output of where the CPU is stalled in lttng code would help.
> 
> 
> > 
> > A couple additional notes:
> > 
> > - LTTV docs state that it requires glib 2.4 or greater. I believe this
> is incorrect due to the following dependency:
> > 
> > $ rpm -qa glib2
> > glib2-2.12.3-4.el5_3.1  << my default glib (RHEL5.x base)
> > 
> > state.c: In function 'copy_process_state':
> > state.c:1344: error: 'GHashTableIter' undeclared (first use in this
> function)
> > 
> > I've installed glib-2.22.5 to get around the above issue.
> 
> OK, the dependency seems to be glib 2.16 now. Will update the README
> and LTTng manual accordingly.
> 
> Thanks,
> 
> Mathieu
> 
> 
> 
> --
> This is an e-mail from General Dynamics Robotic Systems. It is for the 
> intended recipient only and may contain confidential and privileged 
> information. No one else may read, print, store, copy, forward or act in 
> reliance on it or its attachments. If you are not the intended recipient, 
> please return this message to the sender and delete the message and any 
> attachments from your computer. Your cooperation is appreciated.
> 
> 
> _______________________________________________
> ltt-dev mailing list
> [email protected]
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to