Ooops, Actually I spoke commit hash. It was 381b5f8ff5fe2a1d7f26ac39b5591b99c4365bae (not a7dd5c93a51b3b5b8cd4a177ef379492c27859bf) So I was about to correct my last email. But I think Kevin found the problem exactly.
ACK to the patch. Thanks! Joonwoo 2008/9/5 <[EMAIL PROTECTED]>: > Perfect Joonwoo! > > commit 381b5f8ff5fe2a1d7f26ac39b5591b99c4365bae (Write handlers become > EXCLUSIVE by default) made all write handlers exclusive by default. When > click-uninstall writes to /click/config to clear the configuration it is > done with a now exclusive write handler (requiring all threads to be > locked). The action of the write handler then attempts to lock all tasks > again routerthread.cc:633. The write handler action is attempting to grab > a spinlock it already holds -> Deadlock. > > The attached patch simply marks the write handler as NONEXCLUSIVE. The > proper fix would be to remove lock_tasks() from > RouterThread::unschedule_router_tasks, if you can guarantee all routes > into the function have already locked the tasks. > > Thanks! > Kevin > > > >> Hi all, >> >> I bisected tree for this issue, and found following commit is the >> first bad commit. >> >> commit a7dd5c93a51b3b5b8cd4a177ef379492c27859bf >> click-combine documentation: Mention that RouterLink is fake. >> >> Any idea? >> >> Thanks, >> Joonwoo >> >> 2008/9/3 <[EMAIL PROTECTED]>: >>> Hi Eddie, >>> >>> Sorry for the wait. Today I pulled the latest sources and rebuilt a >>> vanilla click kernel on a 64-bit 4 core Xeon machine (with the click >>> e1000 >>> driver installed). The locking seems to be working fine, but >>> 'click-uninstall' is resulting in a lockup. See the attached dumps >>> (click >>> was started with click-install -t 3 in both cases). I still see the >>> lockup >>> with -t 1. >>> >>> Click uninstalls without problem with our version from a couple of weeks >>> back. >>> >>> Unrelated, but I also saw: >>> ../include/click/timestamp.hh: In constructor >>> 'Timestamp::Timestamp(double)': >>> ../include/click/timestamp.hh:938: warning: converting to 'int64_t' from >>> 'double' >>> all over when building userlevel click. >>> >>> Sorry I don't have the time to dive into this further right now. >>> >>> Thanks! >>> Kevin >>> >>> >>> >>> >>>> Kevin, >>>> >>>> Is there any chance you'd be able to try the newer Spinlock code any >>>> time >>>> soon? >>>> >>>> Eddie >>>> >>>> >>>> [EMAIL PROTECTED] wrote: >>>>> Hi, >>>>> >>>>> I think there is a concurrency issue with Spinlocks in linuxmodule >>>>> multi-threaded click (running a 2.6.19.2 patched kernel, the >>>>> e1000-NAPI >>>>> driver and today's trunk). I've put together a temporary patch, but >>>>> there >>>>> might be further issues. Sorry, I don't have the time to investigate >>>>> more. >>>>> >>>>> >>>>> Problems: >>>>> The series of events are: >>>>> -Thread A is running on CPU 0 and thread B is running on CPU 1. >>>>> -A acquires the Spinlock and executes code in the critical section. >>>>> -Linux schedules Thread A to run on CPU 1 and thread B to run on CPU >>>>> 0. >>>>> -B acquires the Spinlock (works because CPU 0 is the owner of the >>>>> lock, >>>>> not the thread) >>>>> -A and B are both operating in the critical section >>>>> -(if you're lucky) A releases the Spinlock and generates a "releasing >>>>> someone else's lock" message >>>>> -(if you're unlucky) Oops >>>>> >>>>> Attached is a test element and a configuration to elicit the behavior. >>>>> I >>>>> could only replicate the problem when using a polling input source >>>>> under >>>>> heavy load. >>>>> >>>>> After fixing the above problem I was still seeing two threads inside >>>>> the >>>>> CS. For some reason the atomic_uint32_t::swap was not working >>>>> atomically >>>>> as expected. Changing this to atomic_uint32::compare_and_swap solved >>>>> the >>>>> problem. We're running x86_64 quad core Xeon CPUs. >>>>> >>>>> >>>>> Solution: >>>>> In the patch attached I added a new function click_current_thread() >>>>> which >>>>> returns a thread_info pointer. Inside of Spinlock I replace >>>>> click_current_processor() uses with click_current_thread(). This way >>>>> even >>>>> if the thread is assigned to another core it will still have the same >>>>> thread_info pointer. >>>>> >>>>> The problem with this solution is that it does not work as an index >>>>> into >>>>> an array nicely and cannot simply replace all uses of >>>>> click_current_processor(). There may be other places in the code which >>>>> make the same incorrect use of click_current_processor(), but a better >>>>> approach will be needed if the value is used to index into an array >>>>> (such >>>>> as in ReadWriteLock). >>>>> >>>>> >>>>> Thanks! >>>>> Kevin Springborn >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> click mailing list >>>>> [email protected] >>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >>>> >>> >>> _______________________________________________ >>> click mailing list >>> [email protected] >>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click >>> >>> >> > _______________________________________________ click mailing list [email protected] https://amsterdam.lcs.mit.edu/mailman/listinfo/click
