Hi Eddie, list, I've come across a multithreading bug in queue signal handling which I can reproduce fairly consistently with userlevel multithreading Click. The symptoms are that either the upstream or downstream task from the queue are unscheduled even though the notifier signal is active, this occurs when the queue either becomes full or empty. To illustrate here is the config I use and some debug handler outputs:
//******************** Config ************************ is1::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>, LIMIT 1000, STOP false) -> ThreadSafeQueue -> uq1::Unqueue -> Discard; is2::InfiniteSource(DATA \<00 00 c0 ae 67 ef 00 00 00 00 00 00 08 00>, LIMIT -1, STOP false) -> q::ThreadSafeQueue -> uq2::Unqueue -> Discard; StaticThreadSched(is1 0, uq1 1, is2 2, uq2 3); //******************** Debug Handler Output when upstream push task is stuck ************************ read q.length 200 Read handler 'q.length' OK DATA 1 0 read q.fullnote_state 200 Read handler 'q.fullnote_state' OK DATA 131 empty notifier off task 0x25b8830 [uq2 :: Unqueue] unscheduled full notifier on task 0x25b8350 [is2 :: InfiniteSource] unscheduled //******************** Debug Handler Output when downstream pull task is stuck ************************ read q.length 200 Read handler 'q.length' OK DATA 4 1000 read q.fullnote_state 200 Read handler 'q.fullnote_state' OK DATA 131 empty notifier on task 0x1c6f830 [uq2 :: Unqueue] unscheduled full notifier off task 0x1c6f350 [is2 :: InfiniteSource] unscheduled //***************************************************************************************************************** Clearly the notifier states are correct, but somehow the relevant task is not rescheduled. The above config uses ThreadSafeQueue but I verified that the same issue occurs when using FullNoteQueue. The obvious places to look are ActiveNotifier::set_active,FullNoteQueue::push_success/push_failure/pull_success/pull_failure but so far I haven't spotted anything wrong with the relevant code, clearly I'm overlooking something. Have you or anyone else on the list got any suggestions? If it helps, I'm running click source from a couple of weeks back, default 64bit Fedora Core 13 kernel with preemption enabled (2.6.33.3-85.fc13.x86_64 ), Intel Dual-Core CPU. I start userlevel click with the following command: 'click --threads=4 conf/threadtest.click -p 777' Cheers Beyers _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
