For what it's worth, Cliff reports that this fix did fix *the* problem. E
On 2/24/11 5:03 PM, Bobby Longpocket wrote: > It may not be *the* bug, but if you hit it, it will produce *the* problem. :) > > I think the fix will work. > > > > --- On Thu, 2/24/11, Eddie Kohler<[email protected]> wrote: > >> From: Eddie Kohler<[email protected]> >> Subject: Re: [Click] Conversion of threads >> To: "Bobby Longpocket"<[email protected]> >> Cc: "Cliff Frey"<[email protected]>, [email protected] >> Date: Thursday, February 24, 2011, 4:23 PM >> Thank you very, very much for this >> very clear explanation. >> >> That sure is *a* bug (not sure it is *the*). What do >> you think of the >> following fix? >> >> http://www.read.cs.ucla.edu/gitweb?p=click;a=commitdiff;h=0cbe4e8bdfbee6efe127cdd714c145cd80e89af9;hp=968964c3e344c2898681e9644d51b20669fef6cd >> >> Eddie >> >> >> On 02/24/2011 02:48 PM, Bobby Longpocket wrote: >>> The bug is in ThreadSafeQueue. >>> >>> You can't treat head and tail independently in the >> push/pull operations. The result of the current code is that >> you can fill the queue beyond capacity (turning it into an >> 'empty' queue as a result). >>> >>> This happens because a thread can be using a stale >> head and tail, and this can go unnoticed if the current >> value of tail (in push) or head (in pull) has wrapped around >> to the original value. >>> >>> For instance, suppose a thread gets into push with >> _head==3 and _tail==1 (one slot left), then gets delayed for >> a while... >>> >>> ...meanwhile, other pushers and pullers work on the >> queue... head=3;tail=2...head=50;tail=49...head=2;tail=1... >>> >>> ...now our thread gets back to work. >> _tail/_xtail is 1 again, so we pass the compare-and-swap and >> we'll do our push with next-tail==2 even though h==2. >>> >>> >>> >>> >>> --- On Thu, 2/24/11, Eddie Kohler<[email protected]> >> wrote: >>> >>>> From: Eddie Kohler<[email protected]> >>>> Subject: Re: [Click] Conversion of threads >>>> To: "Cliff Frey"<[email protected]> >>>> Cc: [email protected] >>>> Date: Thursday, February 24, 2011, 10:30 AM >>>> Hey >>>> >>>> I thought I found a bug in ThreadSafeQueue, but >> convinced >>>> myself it is OK. >>>> There could be problems in the threading core >> (which has >>>> been changed for >>>> coreperformance work). However, I do not >> observe >>>> Click getting stuck. >>>> Rather, I observe it performing radically >> slowly. The >>>> attached config prints >>>> counter information, etc. every second. The >> counters >>>> keep going up. Although >>>> the InfiniteSources often look unscheduled, this >> is because >>>> they *are* >>>> unscheduled much of the time; the >> fast_reschedule() happens >>>> at the end of >>>> InfintieSource::run_task(). I don't know >> what's going >>>> on. Commenting out all >>>> notification does not help. >>>> >>>> Eddie >>>> >>>> >>>> On 2/24/11 8:12 AM, Cliff Frey wrote: >>>>> I wanted to benchmark the difference between >>>> ThreadSafeQueue and >>>>> one-queue-per-thread + RoundRobinScheduler + >> Unqueue + >>>> another-Queue. >>>>> However, I ended up >> finding a bug. >>>>> >>>>> This config: >>>>> >>>>> elementclass Foo { >>>>> is :: InfiniteSource >> -> [0] output; >>>>> }; >>>>> >> f0::Foo,f1::Foo,f2::Foo,f3::Foo,f4::Foo,f5::Foo >>>>> -> master_q :: ThreadSafeQueue >>>>> -> uq2 :: Unqueue >>>>> -> c :: Counter(COUNT_CALL 4000000 >> stop) >>>>> -> Discard; >>>>> StaticThreadSched( >>>>> f0/is 0, >>>>> f1/is 1, >>>>> f2/is 2, >>>>> f3/is 3, >>>>> f4/is 4, >>>>> f5/is 5 >>>>> uq2 7); >>>>> >>>>> When run with 'click --threads=8 foo.click' >> ends up >>>> getting suck, with an >>>>> empty master_q and none of the infinitesource >> elements >>>> scheduled. I'm running >>>>> on a i7 CPU with 4 cores + hyperthreading, >> giving 8 >>>> CPUs from linux's perspective. >>>>> >>>>> Also interesting is how many drops will be >> seen at the >>>> queue in this case (not >>>>> too surprising really) >>>>> >>>>> read f0/is.scheduled >>>>> false >>>>> read master_q.length >>>>> 0 >>>>> read master_q.drops >>>>> 163540 >>>>> read c.count >>>>> 231797 >>>>> >>>>> Cliff >>>>> >>>>> On Thu, Feb 24, 2011 at 7:27 AM, Eddie >> Kohler<[email protected] >>>>> <mailto:[email protected]>> >>>> wrote: >>>>> >>>>> Try >> ThreadSafeQueue at the >>>> converge point, which as Bobby points out should >> be >>>>> correct even >> when >>>> multithreaded, but doesn't require locks. >> However, it >>>> will >>>>> still be >> slowish, as would be >>>> anything that tried to enforce strict order, >>>>> since the queue >> pointers will >>>> bounce among cores. >>>>> >>>>> Eddie >>>>> >>>>> >>>>> On 2/23/11 9:36 >> PM, Philip >>>> Prindeville wrote: >>>>> > I was >> wondering... if I'm >>>> running multiple duplicate paths each on its >>>>> own core and I >> eventually need >>>> to collect them up so I can resequence >>>>> packets and >> pass them up into >>>> user-space... how feasible is it to go from >>>>> threaded and >> lockless to >>>> converging into a single locked queue (fan-in)? >>>>> > >>>>> > I figure >> it would be the best >>>> of both worlds because most of the >>>>> operations >> could happen >>>> threaded and without locking, and the locking >> only >>>>> needs to be >> enforced in the >>>> final stage... >>>>> > >>>>> > Thanks. >>>>> > >>>>> > -Philip >>>>> > >>>>> > >>>> _______________________________________________ >>>>> > click >> mailing list >>>>> > [email protected] >>>> <mailto:[email protected]> >>>>> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click >>>>> >>>> >> _______________________________________________ >>>>> click mailing >> list >>>>> [email protected] >>>> <mailto:[email protected]> >>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >>>>> >>>>> >>>> >>>> -----Inline Attachment Follows----- >>>> >>>> _______________________________________________ >>>> click mailing list >>>> [email protected] >>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >>>> >>> >>> >>> >> _______________________________________________ >> click mailing list >> [email protected] >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >> > > > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
