Nice, been running for a day or so without any signs of the issue I was experiencing before, well done.
Thanks Bobby and Eddie, much appreciated. On Fri, Feb 11, 2011 at 8:46 AM, Eddie Kohler <[email protected]> wrote: > Bobby, > > THANKS!!! Totally right; thanks for the fix. I found anohter bug as well, > and fixed it. > > Awesome!! Much appreciated. > > Eddie > > > > On 02/10/2011 11:40 AM, Bobby Longpocket wrote: > >> I think the problem is just that _any_pending is never getting set, so >> RouterThread::active() returns false even if there are tasks on the pending >> list. >> >> I don't run click the normal way, so I can't easily reproduce the issue, >> but try making the following change: >> >> In RouterThread::active(), replace both occurrences of _any_pending with >> _pending_head. >> >> diff --git a/include/click/routerthread.hh b/include/click/routerthread.hh >> index a405e1c..504706d 100644 >> --- a/include/click/routerthread.hh >> +++ b/include/click/routerthread.hh >> @@ -231,9 +231,9 @@ inline bool >> RouterThread::active() const >> { >> #if HAVE_TASK_HEAP >> - return _task_heap.size() != 0 || _any_pending; >> + return _task_heap.size() != 0 || _pending_head; >> #else >> - return ((const Task *)_next != this) || _any_pending; >> + return ((const Task *)_next != this) || _pending_head; >> #endif >> } >> >> >> --- On Thu, 2/10/11, Eddie Kohler<[email protected]> wrote: >> >> From: Eddie Kohler<[email protected]> >>> Subject: Re: [Click] Core performance checkins >>> To: "Beyers Cronje"<[email protected]> >>> Cc: "Click"<[email protected]> >>> Date: Thursday, February 10, 2011, 6:15 AM >>> I'm very sorry about this >>> regression... Unfortunately due to travel it's going >>> to hard to look at this until the weekend. I would >>> rather not yet revert >>> coreperformance. Can you handle this situation? >>> >>> Eddie >>> >>> >>> On 02/09/2011 05:22 AM, Beyers Cronje wrote: >>> >>>> Hi Eddie, >>>> >>>> Some info that I'm sure will help debugging. I >>>> >>> configured click >>> >>>> with --enable-schedule-debugging=extra and also >>>> >>> enabled NOTIFIERQUEUE_DEBUG >>> >>>> >>>> It seems Unqueue gets stuck in the pending list. See >>>> >>> the output below: >>> >>>> >>>> read q.length >>>> 1000 >>>> >>>> read q.notifier_state >>>> notifier on >>>> task 0x19387b0 [uq :: Unqueue] scheduled >>>> >>>> read uq.scheduled >>>> true /* but pending */ >>>> >>>> read uq.notifier >>>> empty.0/1:1* >>>> >>>> >>>> Unqueue stays in this scheduled but pending state for >>>> >>> an undetermined period >>> >>>> of time, up to minutes some times. >>>> >>>> Any idea where I can start on fixing this bug? >>>> >>>> Beyers >>>> >>>> >>>> On Wed, Feb 9, 2011 at 2:52 AM, Beyers Cronje<[email protected] >>>> <mailto:[email protected]>> >>>> >>> wrote: >>> >>>> >>>> Update - Strange, after typing >>>> >>> my previous email I checked again, and all >>> >>>> of a sudden Unqueue was >>>> >>> pulling packets again. Not sure if me breaking >>> >>>> into Click with gdb >>>> >>> kick-started, it again :) or if it's a intermittent issue. >>> >>>> >>>> >>>> On Wed, Feb 9, 2011 at 2:41 >>>> >>> AM, Beyers Cronje<[email protected] >>> >>>> <mailto:[email protected]>> >>>> >>> wrote: >>> >>>> >>>> Hi Eddie, >>>> >>>> Since running >>>> >>> this merge I've been experiencing issues with usermode >>> >>>> multithreading. >>>> >>> I'm using commit 9419098acbdc20837e37f3033c40661809431f8d >>> >>>> I do believe the >>>> >>> issues are related to the changes of this merged, as >>> >>>> I was running >>>> >>> the same config on pre-coreperformance merge code >>> >>>> without any >>>> >>> issues. >>> >>>> >>>> *Simplified >>>> >>> Config used:* >>> >>>> >>>> >>>> fd::FromDAG(/dev/dag0) -> >>> cl1::Classifier(12/0800) -> MarkIPHeader(14) >>> >>>> -> >>>> >>> ipc1::IPClassifier(udp port 1646); >>> >>>> sfp::SFP(); >>>> >>>> q::ThreadSafeQueue; >>> >>>> log::Logger(sfp, >>>> >>> 1800); >>> >>>> >>>> ipc1 -> >>>> >>> RadAccounting -> q; >>> >>>> q -> >>>> >>> uq::Unqueue -> aupcc::Counter -> sfp; >>> >>>> >>>> >>>> StaticThreadSched(fd 0, uq 1, log 2); >>> >>>> >>>> *Problem >>>> >>> Description:* >>> >>>> The 3 threads in >>>> >>> the config: FromDAG and Unqueue do what their names >>> >>>> indicate, and >>>> >>> Logger has a timer that schedules its task once every >>> >>>> 30min. >>>> >>> Everything runs fine initially and then somewhere along the >>> >>>> line Unqueue >>>> >>> stops pulling packets from the queue which leads to a >>> >>>> constant queue >>>> >>> overflow. When this happens I can see FromDAG is still >>> >>>> working as >>>> >>> q.drops increases constantly, and Logger also fires away >>> >>>> every 30min. >>>> >>> Note push rate from FromDAG is quite high, but >>> >>>> what >>>> >>> RadAccounting pushes into queue is very low ~ 100pps, which >>> means >>> >>>> queue is most of >>>> >>> the time empty and Unqueue is not scheduled. >>> >>>> >>>> Unfortunately I >>>> >>> didn't configure debug scheduling, which would've >>> >>>> helped. I did >>>> >>> notice Unqueue.scheduled always returned true, even >>> >>>> though it never >>>> >>> actually ran. Not sure if the following will help, gdb >>> >>>> shows the >>>> >>> following: >>> >>>> >>>> (gdb) info >>>> >>> threads >>> >>>> 3 Thread >>>> >>> 0x7fa422559700 (LWP 27205) 0x000000392cedb0b3 in poll >>> () >>> >>>> from >>>> >>> /lib64/libc.so.6 >>> >>>> 2 Thread >>>> >>> 0x7fa421d58700 (LWP 27206) 0x000000392cedcee3 in >>> select >>> >>>> () from >>>> >>> /lib64/libc.so.6 >>> >>>> * 1 Thread >>>> >>> 0x7fa432cfb740 (LWP 27197) FromDAG::process_packet >>> >>>> (this=0x1afc570, >>>> >>> erf_record=0x7fa423258bf0, rlen=<value optimized >>> >>>> out>) at >>>> >>> ../elements/local/fromdag.cc:193 >>> >>>> (gdb) thread 2 >>>> [Switching to >>>> >>> thread 2 (Thread 0x7fa421d58700 (LWP 27206))]#0 >>> >>>> >>>> 0x000000392cedb0b3 in poll () from >>> /lib64/libc.so.6 >>> >>>> (gdb) bt >>>> #0 >>>> >>> 0x000000392cedb0b3 in poll () from /lib64/libc.so.6 >>> >>>> #1 >>>> >>> 0x000000000058bc36 in Master::run_selects_poll >>> (this=0x1afacc0, >>> >>>> >>>> thread=0x1afb8b0, more_tasks=false) at >>> ../lib/master.cc:862 >>> >>>> #2 >>>> >>> 0x000000000058c4ec in Master::run_selects (this=0x1afacc0, >>> >>>> >>>> thread=0x1afb8b0) at >>> ../lib/master.cc:1050 >>> >>>> #3 >>>> >>> 0x000000000057ddf1 in run_os (this=0x1afb8b0) at >>> >>>> >>>> ../lib/routerthread.cc:447 >>> >>>> #4 >>>> >>> RouterThread::driver (this=0x1afb8b0) at >>> ../lib/routerthread.cc:568 >>> >>>> #5 >>>> >>> 0x0000000000556dc9 in thread_driver (user_data=<value >>> optimized >>> >>>> out>) at >>>> >>> click.cc:414 >>> >>>> #6 >>>> >>> 0x000000392d206d5b in start_thread () from >>> /lib64/libpthread.so.0 >>> >>>> #7 >>>> >>> 0x000000392cee4aad in clone () from /lib64/libc.so.6 >>> >>>> (gdb) thread 3 >>>> [Switching to >>>> >>> thread 3 (Thread 0x7fa422559700 (LWP 27205))]#0 >>> >>>> >>>> 0x000000392cedcee3 in select () from >>> /lib64/libc.so.6 >>> >>>> (gdb) bt >>>> #0 >>>> >>> 0x000000392cedcee3 in select () from /lib64/libc.so.6 >>> >>>> #1 >>>> >>> 0x000000000058c4bf in Master::run_selects (this=0x1afacc0, >>> >>>> >>>> thread=0x1afb7e0) at >>> ../lib/master.cc:1015 >>> >>>> #2 >>>> >>> 0x000000000057ddf1 in run_os (this=0x1afb7e0) at >>> >>>> >>>> ../lib/routerthread.cc:447 >>> >>>> #3 >>>> >>> RouterThread::driver (this=0x1afb7e0) at >>> ../lib/routerthread.cc:568 >>> >>>> #4 >>>> >>> 0x0000000000556dc9 in thread_driver (user_data=<value >>> optimized >>> >>>> out>) at >>>> >>> click.cc:414 >>> >>>> #5 >>>> >>> 0x000000392d206d5b in start_thread () from >>> /lib64/libpthread.so.0 >>> >>>> #6 >>>> >>> 0x000000392cee4aad in clone () from /lib64/libc.so.6 >>> >>>> (gdb) thread 1 >>>> [Switching to >>>> >>> thread 1 (Thread 0x7fa432cfb740 (LWP 27197))]#0 >>> >>>> >>>> 0x000000392d20ebfd in nanosleep () from >>> /lib64/libpthread.so.0 >>> >>>> (gdb) bt >>>> #0 >>>> >>> FromDAG::process_packet (this=0x1afc570, >>> >>>> >>>> erf_record=0x7fa423258bf0, rlen=<value >>> optimized out>) at >>> >>>> >>>> ../elements/local/fromdag.cc:193 >>> >>>> #1 >>>> >>> 0x00000000004d6402 in FromDAG::run_task (this=0x1afc570) at >>> >>>> >>>> ../elements/local/fromdag.cc:150 >>> >>>> #2 >>>> >>> 0x000000000057dbe6 in fire (this=0x1afb710) at >>> >>>> >>>> ../include/click/task.hh:612 >>> >>>> #3 >>>> >>> run_tasks (this=0x1afb710) at ../lib/routerthread.cc:410 >>> >>>> #4 >>>> >>> RouterThread::driver (this=0x1afb710) at >>> ../lib/routerthread.cc:600 >>> >>>> #5 >>>> >>> 0x0000000000558513 in main (argc=<value optimized >>> out>, >>> >>>> argv=<value >>>> >>> optimized out>) at click.cc:639 >>> >>>> >>>> One thing to >>>> >>> note, for various reasons I'm doing something very >>> >>>> unclick-like >>>> >>> with FromDAG where I allow it to block up to 10ms. For my >>> >>>> specific >>>> >>> requirements this is not a problem, but just in case it >>> might >>> >>>> affect the way >>>> >>> the new task handling operates it's worth noting. >>> >>>> >>>> Beyers >>>> >>>> On Sun, Feb 6, >>>> >>> 2011 at 1:23 AM, Eddie Kohler<[email protected] >>> >>>> <mailto:[email protected]>> >>>> >>> wrote: >>> >>>> >>>> Hi >>>> >>> all, >>> >>>> >>>> >>>> This is just a note to say that I've >>> merged the "coreperformance" >>> >>>> >>>> branch >>> >>>> >>>> with master. There are several >>> changes that may speed up particularly >>> >>>> >>>> simple configurations, and that (more >>> importantly) may make it >>> >>>> >>>> easier to >>> >>>> >>>> experiment with different multithreading >>> setups. (For instance I >>> >>>> >>>> believe switching a task from one thread >>> to another is faster >>> >>>> >>>> now.) Let >>> >>>> me >>>> >>> know if you experience any problems >>> >>>> >>>> >>>> Eddie >>> >>>> >>>> _______________________________________________ >>> >>>> >>>> click mailing list >>> >>>> [email protected] >>>> >>> <mailto:[email protected]> >>> >>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>> click mailing list >>> [email protected] >>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >>> >>> >> >> >> _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
