Update - Strange, after typing my previous email I checked again, and all of a sudden Unqueue was pulling packets again. Not sure if me breaking into Click with gdb kick-started, it again :) or if it's a intermittent issue.
On Wed, Feb 9, 2011 at 2:41 AM, Beyers Cronje <[email protected]> wrote: > Hi Eddie, > > Since running this merge I've been experiencing issues with usermode > multithreading. I'm using commit 9419098acbdc20837e37f3033c40661809431f8d > I do believe the issues are related to the changes of this merged, as I was > running the same config on pre-coreperformance merge code without any > issues. > > *Simplified Config used:* > > fd::FromDAG(/dev/dag0) -> cl1::Classifier(12/0800) -> MarkIPHeader(14) -> > ipc1::IPClassifier(udp port 1646); > sfp::SFP(); > q::ThreadSafeQueue; > log::Logger(sfp, 1800); > > ipc1 -> RadAccounting -> q; > q -> uq::Unqueue -> aupcc::Counter -> sfp; > > StaticThreadSched(fd 0, uq 1, log 2); > > *Problem Description:* > The 3 threads in the config: FromDAG and Unqueue do what their names > indicate, and Logger has a timer that schedules its task once every 30min. > Everything runs fine initially and then somewhere along the line Unqueue > stops pulling packets from the queue which leads to a constant queue > overflow. When this happens I can see FromDAG is still working as q.drops > increases constantly, and Logger also fires away every 30min. Note push rate > from FromDAG is quite high, but what RadAccounting pushes into queue is very > low ~ 100pps, which means queue is most of the time empty and Unqueue is not > scheduled. > > Unfortunately I didn't configure debug scheduling, which would've helped. I > did notice Unqueue.scheduled always returned true, even though it never > actually ran. Not sure if the following will help, gdb shows the following: > > (gdb) info threads > 3 Thread 0x7fa422559700 (LWP 27205) 0x000000392cedb0b3 in poll () from > /lib64/libc.so.6 > 2 Thread 0x7fa421d58700 (LWP 27206) 0x000000392cedcee3 in select () from > /lib64/libc.so.6 > * 1 Thread 0x7fa432cfb740 (LWP 27197) FromDAG::process_packet > (this=0x1afc570, erf_record=0x7fa423258bf0, rlen=<value optimized out>) at > ../elements/local/fromdag.cc:193 > (gdb) thread 2 > [Switching to thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0 > 0x000000392cedb0b3 in poll () from /lib64/libc.so.6 > (gdb) bt > #0 0x000000392cedb0b3 in poll () from /lib64/libc.so.6 > #1 0x000000000058bc36 in Master::run_selects_poll (this=0x1afacc0, > thread=0x1afb8b0, more_tasks=false) at ../lib/master.cc:862 > #2 0x000000000058c4ec in Master::run_selects (this=0x1afacc0, > thread=0x1afb8b0) at ../lib/master.cc:1050 > #3 0x000000000057ddf1 in run_os (this=0x1afb8b0) at > ../lib/routerthread.cc:447 > #4 RouterThread::driver (this=0x1afb8b0) at ../lib/routerthread.cc:568 > #5 0x0000000000556dc9 in thread_driver (user_data=<value optimized out>) > at click.cc:414 > #6 0x000000392d206d5b in start_thread () from /lib64/libpthread.so.0 > #7 0x000000392cee4aad in clone () from /lib64/libc.so.6 > (gdb) thread 3 > [Switching to thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0 > 0x000000392cedcee3 in select () from /lib64/libc.so.6 > (gdb) bt > #0 0x000000392cedcee3 in select () from /lib64/libc.so.6 > #1 0x000000000058c4bf in Master::run_selects (this=0x1afacc0, > thread=0x1afb7e0) at ../lib/master.cc:1015 > #2 0x000000000057ddf1 in run_os (this=0x1afb7e0) at > ../lib/routerthread.cc:447 > #3 RouterThread::driver (this=0x1afb7e0) at ../lib/routerthread.cc:568 > #4 0x0000000000556dc9 in thread_driver (user_data=<value optimized out>) > at click.cc:414 > #5 0x000000392d206d5b in start_thread () from /lib64/libpthread.so.0 > #6 0x000000392cee4aad in clone () from /lib64/libc.so.6 > (gdb) thread 1 > [Switching to thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0 > 0x000000392d20ebfd in nanosleep () from /lib64/libpthread.so.0 > (gdb) bt > #0 FromDAG::process_packet (this=0x1afc570, erf_record=0x7fa423258bf0, > rlen=<value optimized out>) at ../elements/local/fromdag.cc:193 > #1 0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at > ../elements/local/fromdag.cc:150 > #2 0x000000000057dbe6 in fire (this=0x1afb710) at > ../include/click/task.hh:612 > #3 run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410 > #4 RouterThread::driver (this=0x1afb710) at ../lib/routerthread.cc:600 > #5 0x0000000000558513 in main (argc=<value optimized out>, argv=<value > optimized out>) at click.cc:639 > > One thing to note, for various reasons I'm doing something very > unclick-like with FromDAG where I allow it to block up to 10ms. For my > specific requirements this is not a problem, but just in case it might > affect the way the new task handling operates it's worth noting. > > Beyers > > On Sun, Feb 6, 2011 at 1:23 AM, Eddie Kohler <[email protected]> wrote: > >> Hi all, >> >> This is just a note to say that I've merged the "coreperformance" branch >> with master. There are several changes that may speed up particularly >> simple configurations, and that (more importantly) may make it easier to >> experiment with different multithreading setups. (For instance I >> believe switching a task from one thread to another is faster now.) Let >> me know if you experience any problems >> >> Eddie >> _______________________________________________ >> click mailing list >> [email protected] >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >> > > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
