Great, glad it's fixed! E
On 10/15/2010 06:35 AM, Beyers Cronje wrote: > Hi Eddie, > I'm happy to report that I've been unable to reproduce the bug using the > latest commit. So as far as I can tell this fixed the issue I was having. > Eddie, Cliff, you guys rock, thanks so much for your assistance on this. > Kind regards > Beyers Cronje > > On Fri, Oct 15, 2010 at 6:24 AM, Eddie Kohler <[email protected] > <mailto:[email protected]>> wrote: > > Beyers, > > I believe I've found the problem... or A problem. What does this > commit do for you? > > > http://github.com/kohler/click/commit/96471bf1c0873447de007969f5a1e49b1a947071 > > Eddie > > > > On 10/13/2010 05:01 PM, Beyers Cronje wrote: > > Hi Eddie, > > Thank you for your response and looking into this. I keep forgetting > about your excellent Script element, during my tests I used a > much more > rudimentary method of verifying the issue :) I used a Counter > element > before and after the queue and manually read the counter's rate and > count handlers. After a while I noticed that both counters rate > dropped > to 0 and and both counters count stopped increasing, > *permanently*. From > what I can see the tasks NEVER re-schedule themselves. > The only way I found to reschedule them was to manually issue a > reset on > the queue, which seems to re-activate the notifier signals and > in turn > schedule the tasks. > One thing I forgot to mention, I set SLEEPINESS_TRIGGER to 1, > this seems > to trigger the condition quicker. Also, strangely if I set the first > InfiniteSoure LIMIT to 1000 instead of -1 (as per original config > posted) it also seems to trigger the condition much quicker, in > fact I > couldn't reproduce the issue with both InfiniteSource elements > set to -1 > during a 10min test run where with LIMIT 1000 it always happens > within > 10min of testing time. > I'm out of the office now, but will test with both your script and > Cliff's script tomorrow and post back my results. > Beyers > > On Wed, Oct 13, 2010 at 5:06 PM, Eddie Kohler > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > Hi Beyers! > > I am not sure that I agree with your analysis about the queue > getting "stuck." > > Here is the configuration I used, a tweak of yours: > > ////////////////////////////////////////////////////////////// > > is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 > 00 08 00>, > LIMIT -1, STOP false) -> q1::FullNoteQueue -> uq1::Unqueue > -> Discard; > > is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 > 00 08 00>, > LIMIT -1, STOP false) -> q2::FullNoteQueue -> uq2::Unqueue > -> Discard; > > > StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); > Script(TYPE ACTIVE, > init x 0, > set x $(add $x 1), > goto foo $(or $(is1.scheduled) $(uq1.scheduled)), > print "$x: is1.scheduled false, uq1.scheduled false, > q1.length $(q1.length)", > label foo, > goto bar $(or $(is2.scheduled) $(uq2.scheduled)), > print "$x: is2.scheduled false, uq2.scheduled false, > q2.length $(q2.length)", > label bar, > wait 0.5s, > loop); > ////////////////////////////////////////////////////////////// > > This should print a message whenever neither source nor sink is > scheduled. > > I also tried this with "FullNoteQueue" replaced with > "ThreadSafeQueue." > > If a queue was truly getting 'stuck,' we would expect to see > something like > > 745: is1.scheduled false, uq1.scheduled false > 746: is1.scheduled false, uq1.scheduled false > 747: is1.scheduled false, uq1.scheduled false > 748: is1.scheduled false, uq1.scheduled false > 749: is1.scheduled false, uq1.scheduled false.... > > with no gaps, because something would go unscheduled *and > stay that > way*. > > But that's not what I see. I see this: > > 1: is1.scheduled false, uq1.scheduled false, q1.length 17 > 1: is2.scheduled false, uq2.scheduled false, q2.length 0 > 20: is1.scheduled false, uq1.scheduled false, q1.length 14 > 22: is1.scheduled false, uq1.scheduled false, q1.length 0 > 28: is1.scheduled false, uq1.scheduled false, q1.length 14 > 41: is1.scheduled false, uq1.scheduled false, q1.length 0 > 47: is2.scheduled false, uq2.scheduled false, q2.length 0 > 55: is2.scheduled false, uq2.scheduled false, q2.length 0 > 56: is2.scheduled false, uq2.scheduled false, q2.length 0 > 76: is2.scheduled false, uq2.scheduled false, q2.length 0 > > *** Now this, on the other hand, I would expect. In a > multithreaded > environment, notification will cause some delay between the > upstream > & downstream sides of a queue. This is because the upstream side > must wake up the downstream side, or vice versa. The > cross-thread > communication required for this can happen a bit slowly, > leaving a > moment (maybe more than a moment) where neither task appears > to be > scheduled. (In fact, at least one of the tasks will be > "pre-scheduled.") > > So my questions for you are this. Did you really mean "stuck"? > Have you verified that the tasks NEVER re-schedule > themselves? Do > you agree with my analysis? Or, are you saying that the > "pauses" > are too long, even though the tasks do reschedule themselves? > > Eddie > > > > On 10/12/2010 05:23 PM, Beyers Cronje wrote: > > Hi Eddie, list, > > I've come across a multithreading bug in queue signal > handling > which I can > reproduce fairly consistently with userlevel multithreading > Click. The > symptoms are that either the upstream or downstream task > from > the queue are > unscheduled even though the notifier signal is active, this > occurs when the > queue either becomes full or empty. To illustrate here > is the > config I use > and some debug handler outputs: > > //******************** Config ************************ > is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 > 00 00 00 > 08 00>, > LIMIT 1000, STOP false) -> ThreadSafeQueue -> > uq1::Unqueue -> > Discard; > is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 > 00 00 00 > 08 00>, > LIMIT -1, STOP false) -> q::ThreadSafeQueue -> > uq2::Unqueue -> > Discard; > > StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); > > > //******************** Debug Handler Output when > upstream push > task is stuck > ************************ > read q.length > 200 Read handler 'q.length' OK > DATA 1 > 0 > > read q.fullnote_state > 200 Read handler 'q.fullnote_state' OK > DATA 131 > empty notifier off > task 0x25b8830 [uq2 :: Unqueue] unscheduled > full notifier on > task 0x25b8350 [is2 :: InfiniteSource] unscheduled > > //******************** Debug Handler Output when > downstream pull > task is > stuck ************************ > read q.length > 200 Read handler 'q.length' OK > DATA 4 > 1000 > > read q.fullnote_state > 200 Read handler 'q.fullnote_state' OK > DATA 131 > empty notifier on > task 0x1c6f830 [uq2 :: Unqueue] unscheduled > full notifier off > task 0x1c6f350 [is2 :: InfiniteSource] unscheduled > > > > //***************************************************************************************************************** > > Clearly the notifier states are correct, but somehow the > relevant task is > not rescheduled. The above config uses ThreadSafeQueue but I > verified that > the same issue occurs when using FullNoteQueue. > The obvious places to look are > > > ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure > but so far I haven't spotted anything wrong with the > relevant > code, clearly > I'm overlooking something. > > Have you or anyone else on the list got any suggestions? > > If it helps, I'm running click source from a couple of weeks > back, default > 64bit Fedora Core 13 kernel with preemption enabled > (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU. > I start userlevel click with the following command: 'click > --threads=4 > conf/threadtest.click -p 777' > > Cheers > > Beyers > _______________________________________________ > click mailing list > [email protected] <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click > > > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
