This script is very useful. Perhaps (unshockingly :| god damn it) there is some problem with waking up a sleeping thread at userlevel; you may recall a lot of retarded drama around that issue. The Script may help by cauisng threads to sleep for limited amounts of time. I will look.
Eddie On 10/13/2010 09:01 AM, Cliff Frey wrote: > I am getting the stuck behavior that Beyers describes. However if I add > the script, the problem for the most part goes away. > > Specifically, if I use this config: > > ////////////////////////////////////////////////////////////// > is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>, > LIMIT 1000, STOP false) > -> q1::ThreadSafeQueue > -> uq1::Unqueue > -> Discard; > > is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>, > LIMIT -1, STOP false) > -> q2::ThreadSafeQueue > -> uq2::Unqueue > -> Discard; > > StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); > > Script(TYPE ACTIVE, > init x 0, > set x $(add $x 1), > read uq2.count, > write uq2.reset_counts, > wait 5s, > loop); > ////////////////////////////////////////////////////////////// > > Then when I run with --threads=4, I get highly variable (and low) output: > uq2.count: > 11973 > uq2.count: > 332166 > uq2.count: > 86556 > uq2.count: > 2884924 > uq2.count: > 419946 > > compared to if I run with threads=1 > uq2.count: > 12183593 > uq2.count: > 12186308 > uq2.count: > 12091170 > > This makes me think that it is a problem with the thread scheduling > logic, not with ThreadSafeQueue, because the scheduling of the Script > task seems to kick things correctly. I can look into this more, but > I'll wait for some feedback from Eddie first... > > Cliff > > On Wed, Oct 13, 2010 at 8:06 AM, Eddie Kohler <[email protected] > <mailto:[email protected]>> wrote: > > Hi Beyers! > > I am not sure that I agree with your analysis about the queue > getting "stuck." > > Here is the configuration I used, a tweak of yours: > > ////////////////////////////////////////////////////////////// > is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>, > LIMIT -1, STOP false) -> q1::FullNoteQueue -> uq1::Unqueue -> Discard; > is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>, > LIMIT -1, STOP false) -> q2::FullNoteQueue -> uq2::Unqueue -> Discard; > > StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); > Script(TYPE ACTIVE, > init x 0, > set x $(add $x 1), > goto foo $(or $(is1.scheduled) $(uq1.scheduled)), > print "$x: is1.scheduled false, uq1.scheduled false, > q1.length $(q1.length)", > label foo, > goto bar $(or $(is2.scheduled) $(uq2.scheduled)), > print "$x: is2.scheduled false, uq2.scheduled false, > q2.length $(q2.length)", > label bar, > wait 0.5s, > loop); > ////////////////////////////////////////////////////////////// > > This should print a message whenever neither source nor sink is > scheduled. > > I also tried this with "FullNoteQueue" replaced with "ThreadSafeQueue." > > If a queue was truly getting 'stuck,' we would expect to see > something like > > 745: is1.scheduled false, uq1.scheduled false > 746: is1.scheduled false, uq1.scheduled false > 747: is1.scheduled false, uq1.scheduled false > 748: is1.scheduled false, uq1.scheduled false > 749: is1.scheduled false, uq1.scheduled false.... > > with no gaps, because something would go unscheduled *and stay that > way*. > > But that's not what I see. I see this: > > 1: is1.scheduled false, uq1.scheduled false, q1.length 17 > 1: is2.scheduled false, uq2.scheduled false, q2.length 0 > 20: is1.scheduled false, uq1.scheduled false, q1.length 14 > 22: is1.scheduled false, uq1.scheduled false, q1.length 0 > 28: is1.scheduled false, uq1.scheduled false, q1.length 14 > 41: is1.scheduled false, uq1.scheduled false, q1.length 0 > 47: is2.scheduled false, uq2.scheduled false, q2.length 0 > 55: is2.scheduled false, uq2.scheduled false, q2.length 0 > 56: is2.scheduled false, uq2.scheduled false, q2.length 0 > 76: is2.scheduled false, uq2.scheduled false, q2.length 0 > > *** Now this, on the other hand, I would expect. In a multithreaded > environment, notification will cause some delay between the upstream & > downstream sides of a queue. This is because the upstream side must > wake up > the downstream side, or vice versa. The cross-thread communication > required > for this can happen a bit slowly, leaving a moment (maybe more than > a moment) > where neither task appears to be scheduled. (In fact, at least one > of the > tasks will be "pre-scheduled.") > > So my questions for you are this. Did you really mean "stuck"? > Have you > verified that the tasks NEVER re-schedule themselves? Do you agree > with my > analysis? Or, are you saying that the "pauses" are too long, even > though the > tasks do reschedule themselves? > > Eddie > > > On 10/12/2010 05:23 PM, Beyers Cronje wrote: > > Hi Eddie, list, > > > > I've come across a multithreading bug in queue signal handling > which I can > > reproduce fairly consistently with userlevel multithreading > Click. The > > symptoms are that either the upstream or downstream task from the > queue are > > unscheduled even though the notifier signal is active, this > occurs when the > > queue either becomes full or empty. To illustrate here is the > config I use > > and some debug handler outputs: > > > > //******************** Config ************************ > > is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 > 08 00>, > > LIMIT 1000, STOP false) -> ThreadSafeQueue -> uq1::Unqueue -> > Discard; > > is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 > 08 00>, > > LIMIT -1, STOP false) -> q::ThreadSafeQueue -> uq2::Unqueue -> > Discard; > > > > StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); > > > > > > //******************** Debug Handler Output when upstream push > task is stuck > > ************************ > > read q.length > > 200 Read handler 'q.length' OK > > DATA 1 > > 0 > > > > read q.fullnote_state > > 200 Read handler 'q.fullnote_state' OK > > DATA 131 > > empty notifier off > > task 0x25b8830 [uq2 :: Unqueue] unscheduled > > full notifier on > > task 0x25b8350 [is2 :: InfiniteSource] unscheduled > > > > //******************** Debug Handler Output when downstream pull > task is > > stuck ************************ > > read q.length > > 200 Read handler 'q.length' OK > > DATA 4 > > 1000 > > > > read q.fullnote_state > > 200 Read handler 'q.fullnote_state' OK > > DATA 131 > > empty notifier on > > task 0x1c6f830 [uq2 :: Unqueue] unscheduled > > full notifier off > > task 0x1c6f350 [is2 :: InfiniteSource] unscheduled > > > > > > //***************************************************************************************************************** > > > > Clearly the notifier states are correct, but somehow the relevant > task is > > not rescheduled. The above config uses ThreadSafeQueue but I > verified that > > the same issue occurs when using FullNoteQueue. > > The obvious places to look are > > > > ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure > > but so far I haven't spotted anything wrong with the relevant > code, clearly > > I'm overlooking something. > > > > Have you or anyone else on the list got any suggestions? > > > > If it helps, I'm running click source from a couple of weeks > back, default > > 64bit Fedora Core 13 kernel with preemption enabled > > (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU. > > I start userlevel click with the following command: 'click > --threads=4 > > conf/threadtest.click -p 777' > > > > Cheers > > > > Beyers > > _______________________________________________ > > click mailing list > > [email protected] <mailto:[email protected]> > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click > _______________________________________________ > click mailing list > [email protected] <mailto:[email protected]> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click > > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
