Hooray! I'm very glad it was something simple (in fact retarded). Cliff is still welcome to teach a concurrency class any time :) :) :)
E On 2/12/11 3:41 PM, Beyers Cronje wrote: > Nice, been running for a day or so without any signs of the issue I was > experiencing before, well done. > > Thanks Bobby and Eddie, much appreciated. > > On Fri, Feb 11, 2011 at 8:46 AM, Eddie Kohler <[email protected] > <mailto:[email protected]>> wrote: > > Bobby, > > THANKS!!! Totally right; thanks for the fix. I found anohter bug as > well, and fixed it. > > Awesome!! Much appreciated. > > Eddie > > > > On 02/10/2011 11:40 AM, Bobby Longpocket wrote: > > I think the problem is just that _any_pending is never getting set, so > RouterThread::active() returns false even if there are tasks on the > pending list. > > I don't run click the normal way, so I can't easily reproduce the > issue, but try making the following change: > > In RouterThread::active(), replace both occurrences of _any_pending > with _pending_head. > > diff --git a/include/click/routerthread.hh > b/include/click/routerthread.hh > index a405e1c..504706d 100644 > --- a/include/click/routerthread.hh > +++ b/include/click/routerthread.hh > @@ -231,9 +231,9 @@ inline bool > RouterThread::active() const > { > #if HAVE_TASK_HEAP > - return _task_heap.size() != 0 || _any_pending; > + return _task_heap.size() != 0 || _pending_head; > #else > - return ((const Task *)_next != this) || _any_pending; > + return ((const Task *)_next != this) || _pending_head; > #endif > } > > > --- On Thu, 2/10/11, Eddie Kohler<[email protected] > <mailto:[email protected]>> wrote: > > From: Eddie Kohler<[email protected] <mailto:[email protected]>> > Subject: Re: [Click] Core performance checkins > To: "Beyers Cronje"<[email protected] <mailto:[email protected]>> > Cc: "Click"<[email protected] > <mailto:[email protected]>> > Date: Thursday, February 10, 2011, 6:15 AM > I'm very sorry about this > regression... Unfortunately due to travel it's going > to hard to look at this until the weekend. I would > rather not yet revert > coreperformance. Can you handle this situation? > > Eddie > > > On 02/09/2011 05:22 AM, Beyers Cronje wrote: > > Hi Eddie, > > Some info that I'm sure will help debugging. I > > configured click > > with --enable-schedule-debugging=extra and also > > enabled NOTIFIERQUEUE_DEBUG > > > It seems Unqueue gets stuck in the pending list. See > > the output below: > > > read q.length > 1000 > > read q.notifier_state > notifier on > task 0x19387b0 [uq :: Unqueue] scheduled > > read uq.scheduled > true /* but pending */ > > read uq.notifier > empty.0/1:1* > > > Unqueue stays in this scheduled but pending state for > > an undetermined period > > of time, up to minutes some times. > > Any idea where I can start on fixing this bug? > > Beyers > > > On Wed, Feb 9, 2011 at 2:52 AM, Beyers > Cronje<[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > > wrote: > > > Update - Strange, after typing > > my previous email I checked again, and all > > of a sudden Unqueue was > > pulling packets again. Not sure if me breaking > > into Click with gdb > > kick-started, it again :) or if it's a intermittent issue. > > > > On Wed, Feb 9, 2011 at 2:41 > > AM, Beyers Cronje<[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > > wrote: > > > Hi Eddie, > > Since running > > this merge I've been experiencing issues with usermode > > multithreading. > > I'm using commit 9419098acbdc20837e37f3033c40661809431f8d > > I do believe the > > issues are related to the changes of this merged, as > > I was running > > the same config on pre-coreperformance merge code > > without any > > issues. > > > *Simplified > > Config used:* > > > > fd::FromDAG(/dev/dag0) -> > cl1::Classifier(12/0800) -> MarkIPHeader(14) > > -> > > ipc1::IPClassifier(udp port 1646); > > sfp::SFP(); > > q::ThreadSafeQueue; > > log::Logger(sfp, > > 1800); > > > ipc1 -> > > RadAccounting -> q; > > q -> > > uq::Unqueue -> aupcc::Counter -> sfp; > > > > StaticThreadSched(fd 0, uq 1, log 2); > > > *Problem > > Description:* > > The 3 threads in > > the config: FromDAG and Unqueue do what their names > > indicate, and > > Logger has a timer that schedules its task once every > > 30min. > > Everything runs fine initially and then somewhere along the > > line Unqueue > > stops pulling packets from the queue which leads to a > > constant queue > > overflow. When this happens I can see FromDAG is still > > working as > > q.drops increases constantly, and Logger also fires away > > every 30min. > > Note push rate from FromDAG is quite high, but > > what > > RadAccounting pushes into queue is very low ~ 100pps, which > means > > queue is most of > > the time empty and Unqueue is not scheduled. > > > Unfortunately I > > didn't configure debug scheduling, which would've > > helped. I did > > notice Unqueue.scheduled always returned true, even > > though it never > > actually ran. Not sure if the following will help, gdb > > shows the > > following: > > > (gdb) info > > threads > > 3 Thread > > 0x7fa422559700 (LWP 27205) 0x000000392cedb0b3 in poll > () > > from > > /lib64/libc.so.6 > > 2 Thread > > 0x7fa421d58700 (LWP 27206) 0x000000392cedcee3 in > select > > () from > > /lib64/libc.so.6 > > * 1 Thread > > 0x7fa432cfb740 (LWP 27197) FromDAG::process_packet > > (this=0x1afc570, > > erf_record=0x7fa423258bf0, rlen=<value optimized > > out>) at > > ../elements/local/fromdag.cc:193 > > (gdb) thread 2 > [Switching to > > thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0 > > > 0x000000392cedb0b3 in poll () from > /lib64/libc.so.6 > > (gdb) bt > #0 > > 0x000000392cedb0b3 in poll () from /lib64/libc.so.6 > > #1 > > 0x000000000058bc36 in Master::run_selects_poll > (this=0x1afacc0, > > > thread=0x1afb8b0, more_tasks=false) at > ../lib/master.cc:862 > > #2 > > 0x000000000058c4ec in Master::run_selects (this=0x1afacc0, > > > thread=0x1afb8b0) at > ../lib/master.cc:1050 > > #3 > > 0x000000000057ddf1 in run_os (this=0x1afb8b0) at > > > ../lib/routerthread.cc:447 > > #4 > > RouterThread::driver (this=0x1afb8b0) at > ../lib/routerthread.cc:568 > > #5 > > 0x0000000000556dc9 in thread_driver (user_data=<value > optimized > > out>) at > > click.cc:414 > > #6 > > 0x000000392d206d5b in start_thread () from > /lib64/libpthread.so.0 > > #7 > > 0x000000392cee4aad in clone () from /lib64/libc.so.6 > > (gdb) thread 3 > [Switching to > > thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0 > > > 0x000000392cedcee3 in select () from > /lib64/libc.so.6 > > (gdb) bt > #0 > > 0x000000392cedcee3 in select () from /lib64/libc.so.6 > > #1 > > 0x000000000058c4bf in Master::run_selects (this=0x1afacc0, > > > thread=0x1afb7e0) at > ../lib/master.cc:1015 > > #2 > > 0x000000000057ddf1 in run_os (this=0x1afb7e0) at > > > ../lib/routerthread.cc:447 > > #3 > > RouterThread::driver (this=0x1afb7e0) at > ../lib/routerthread.cc:568 > > #4 > > 0x0000000000556dc9 in thread_driver (user_data=<value > optimized > > out>) at > > click.cc:414 > > #5 > > 0x000000392d206d5b in start_thread () from > /lib64/libpthread.so.0 > > #6 > > 0x000000392cee4aad in clone () from /lib64/libc.so.6 > > (gdb) thread 1 > [Switching to > > thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0 > > > 0x000000392d20ebfd in nanosleep () from > /lib64/libpthread.so.0 > > (gdb) bt > #0 > > FromDAG::process_packet (this=0x1afc570, > > > erf_record=0x7fa423258bf0, rlen=<value > optimized out>) at > > > ../elements/local/fromdag.cc:193 > > #1 > > 0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at > > > ../elements/local/fromdag.cc:150 > > #2 > > 0x000000000057dbe6 in fire (this=0x1afb710) at > > > ../include/click/task.hh:612 > > #3 > > run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410 > > #4 > > RouterThread::driver (this=0x1afb710) at > ../lib/routerthread.cc:600 > > #5 > > 0x0000000000558513 in main (argc=<value optimized > out>, > > argv=<value > > optimized out>) at click.cc:639 > > > One thing to > > note, for various reasons I'm doing something very > > unclick-like > > with FromDAG where I allow it to block up to 10ms. For my > > specific > > requirements this is not a problem, but just in case it > might > > affect the way > > the new task handling operates it's worth noting. > > > Beyers > > On Sun, Feb 6, > > 2011 at 1:23 AM, Eddie Kohler<[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > > wrote: > > > Hi > > all, > > > > This is just a note to say that I've > merged the "coreperformance" > > > branch > > > with master. There are several > changes that may speed up particularly > > > simple configurations, and that (more > importantly) may make it > > > easier to > > > experiment with different multithreading > setups. (For instance I > > > believe switching a task from one thread > to another is faster > > > now.) Let > > me > > know if you experience any problems > > > > Eddie > > > _______________________________________________ > > > click mailing list > > [email protected] > <mailto:[email protected]> > > <mailto:[email protected] > <mailto:[email protected]>> > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click > > > > > _______________________________________________ > click mailing list > [email protected] <mailto:[email protected]> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click > > > > > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
