Hi Eddie, I'm happy to report that I've been unable to reproduce the bug using the latest commit. So as far as I can tell this fixed the issue I was having.
Eddie, Cliff, you guys rock, thanks so much for your assistance on this. Kind regards Beyers Cronje On Fri, Oct 15, 2010 at 6:24 AM, Eddie Kohler <[email protected]> wrote: > Beyers, > > I believe I've found the problem... or A problem. What does this commit do > for you? > > > http://github.com/kohler/click/commit/96471bf1c0873447de007969f5a1e49b1a947071 > > Eddie > > > > On 10/13/2010 05:01 PM, Beyers Cronje wrote: > >> Hi Eddie, >> >> Thank you for your response and looking into this. I keep forgetting >> about your excellent Script element, during my tests I used a much more >> rudimentary method of verifying the issue :) I used a Counter element >> before and after the queue and manually read the counter's rate and >> count handlers. After a while I noticed that both counters rate dropped >> to 0 and and both counters count stopped increasing, *permanently*. From >> what I can see the tasks NEVER re-schedule themselves. >> The only way I found to reschedule them was to manually issue a reset on >> the queue, which seems to re-activate the notifier signals and in turn >> schedule the tasks. >> One thing I forgot to mention, I set SLEEPINESS_TRIGGER to 1, this seems >> to trigger the condition quicker. Also, strangely if I set the first >> InfiniteSoure LIMIT to 1000 instead of -1 (as per original config >> posted) it also seems to trigger the condition much quicker, in fact I >> couldn't reproduce the issue with both InfiniteSource elements set to -1 >> during a 10min test run where with LIMIT 1000 it always happens within >> 10min of testing time. >> I'm out of the office now, but will test with both your script and >> Cliff's script tomorrow and post back my results. >> Beyers >> >> On Wed, Oct 13, 2010 at 5:06 PM, Eddie Kohler <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Beyers! >> >> I am not sure that I agree with your analysis about the queue >> getting "stuck." >> >> Here is the configuration I used, a tweak of yours: >> >> ////////////////////////////////////////////////////////////// >> >> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 >> 00>, >> LIMIT -1, STOP false) -> q1::FullNoteQueue -> uq1::Unqueue -> Discard; >> >> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 >> 00>, >> LIMIT -1, STOP false) -> q2::FullNoteQueue -> uq2::Unqueue -> Discard; >> >> >> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); >> Script(TYPE ACTIVE, >> init x 0, >> set x $(add $x 1), >> goto foo $(or $(is1.scheduled) $(uq1.scheduled)), >> print "$x: is1.scheduled false, uq1.scheduled false, >> q1.length $(q1.length)", >> label foo, >> goto bar $(or $(is2.scheduled) $(uq2.scheduled)), >> print "$x: is2.scheduled false, uq2.scheduled false, >> q2.length $(q2.length)", >> label bar, >> wait 0.5s, >> loop); >> ////////////////////////////////////////////////////////////// >> >> This should print a message whenever neither source nor sink is >> scheduled. >> >> I also tried this with "FullNoteQueue" replaced with "ThreadSafeQueue." >> >> If a queue was truly getting 'stuck,' we would expect to see >> something like >> >> 745: is1.scheduled false, uq1.scheduled false >> 746: is1.scheduled false, uq1.scheduled false >> 747: is1.scheduled false, uq1.scheduled false >> 748: is1.scheduled false, uq1.scheduled false >> 749: is1.scheduled false, uq1.scheduled false.... >> >> with no gaps, because something would go unscheduled *and stay that >> way*. >> >> But that's not what I see. I see this: >> >> 1: is1.scheduled false, uq1.scheduled false, q1.length 17 >> 1: is2.scheduled false, uq2.scheduled false, q2.length 0 >> 20: is1.scheduled false, uq1.scheduled false, q1.length 14 >> 22: is1.scheduled false, uq1.scheduled false, q1.length 0 >> 28: is1.scheduled false, uq1.scheduled false, q1.length 14 >> 41: is1.scheduled false, uq1.scheduled false, q1.length 0 >> 47: is2.scheduled false, uq2.scheduled false, q2.length 0 >> 55: is2.scheduled false, uq2.scheduled false, q2.length 0 >> 56: is2.scheduled false, uq2.scheduled false, q2.length 0 >> 76: is2.scheduled false, uq2.scheduled false, q2.length 0 >> >> *** Now this, on the other hand, I would expect. In a multithreaded >> environment, notification will cause some delay between the upstream >> & downstream sides of a queue. This is because the upstream side >> must wake up the downstream side, or vice versa. The cross-thread >> communication required for this can happen a bit slowly, leaving a >> moment (maybe more than a moment) where neither task appears to be >> scheduled. (In fact, at least one of the tasks will be >> "pre-scheduled.") >> >> So my questions for you are this. Did you really mean "stuck"? >> Have you verified that the tasks NEVER re-schedule themselves? Do >> you agree with my analysis? Or, are you saying that the "pauses" >> are too long, even though the tasks do reschedule themselves? >> >> Eddie >> >> >> >> On 10/12/2010 05:23 PM, Beyers Cronje wrote: >> >> Hi Eddie, list, >> >> I've come across a multithreading bug in queue signal handling >> which I can >> reproduce fairly consistently with userlevel multithreading >> Click. The >> symptoms are that either the upstream or downstream task from >> the queue are >> unscheduled even though the notifier signal is active, this >> occurs when the >> queue either becomes full or empty. To illustrate here is the >> config I use >> and some debug handler outputs: >> >> //******************** Config ************************ >> is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 >> 08 00>, >> LIMIT 1000, STOP false) -> ThreadSafeQueue -> uq1::Unqueue -> >> Discard; >> is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 >> 08 00>, >> LIMIT -1, STOP false) -> q::ThreadSafeQueue -> uq2::Unqueue -> >> Discard; >> >> StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); >> >> >> //******************** Debug Handler Output when upstream push >> task is stuck >> ************************ >> read q.length >> 200 Read handler 'q.length' OK >> DATA 1 >> 0 >> >> read q.fullnote_state >> 200 Read handler 'q.fullnote_state' OK >> DATA 131 >> empty notifier off >> task 0x25b8830 [uq2 :: Unqueue] unscheduled >> full notifier on >> task 0x25b8350 [is2 :: InfiniteSource] unscheduled >> >> //******************** Debug Handler Output when downstream pull >> task is >> stuck ************************ >> read q.length >> 200 Read handler 'q.length' OK >> DATA 4 >> 1000 >> >> read q.fullnote_state >> 200 Read handler 'q.fullnote_state' OK >> DATA 131 >> empty notifier on >> task 0x1c6f830 [uq2 :: Unqueue] unscheduled >> full notifier off >> task 0x1c6f350 [is2 :: InfiniteSource] unscheduled >> >> >> >> //***************************************************************************************************************** >> >> Clearly the notifier states are correct, but somehow the >> relevant task is >> not rescheduled. The above config uses ThreadSafeQueue but I >> verified that >> the same issue occurs when using FullNoteQueue. >> The obvious places to look are >> >> >> ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure >> but so far I haven't spotted anything wrong with the relevant >> code, clearly >> I'm overlooking something. >> >> Have you or anyone else on the list got any suggestions? >> >> If it helps, I'm running click source from a couple of weeks >> back, default >> 64bit Fedora Core 13 kernel with preemption enabled >> (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU. >> I start userlevel click with the following command: 'click >> --threads=4 >> conf/threadtest.click -p 777' >> >> Cheers >> >> Beyers >> _______________________________________________ >> click mailing list >> [email protected] <mailto:[email protected]> >> >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >> >> >> _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
