I have a dual Athlon XP, 3 NICs (2xIntel e100 and 1x Realtek), RH9 and never experienced this problem.
Luca
Burton M. Strauss III wrote:
Tigger is a dual CPU (P3-1000) running RH8 (2.4.20 kernel) with two NICs for ntop) - one WAN one LAN.
I'm seeing deadlocks, which look - under gdb - like they're occurring in the sched_yield() call. It actually looks like it's deadlocking the POSIX thread control thread.
If I disable the SCHED_YIELD, it seems better - at least I'm now up to 52m of run time vs. the usual 1-10 before lockup...
Specifically, I'm seeing this when I connect to the hung ntop via gdb:
(gdb) info thread 9 Thread 114696 (LWP 27300) 0x40253c68 in recvfrom () from /lib/i686/libpthread.so.0 8 Thread 98311 (LWP 27299) 0x4209ad41 in __tzfile_compute () from /lib/i686/libc.so.6 7 Thread 81926 (LWP 27298) 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 6 Thread 65541 (LWP 27297) 0x420b0226 in nanosleep () from /lib/i686/libc.so.6 5 Thread 49156 (LWP 27296) 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 4 Thread 32771 (LWP 27295) 0x420cd207 in sched_yield () from /lib/i686/libc.so.6 3 Thread 16386 (LWP 27294) 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 2 Thread 32769 (LWP 27293) 0x420db1a7 in poll () from /lib/i686/libc.so.6 1 Thread 16384 (LWP 27290) 0x420b0226 in nanosleep () from /lib/i686/libc.so.6
(gdb) thread 4 [Switching to thread 4 (Thread 32771 (LWP 27295))]#0 0x420cd207 in sched_yield () from /lib/i686/libc.so.6 (gdb) info stac #0 0x420cd207 in sched_yield () from /lib/i686/libc.so.6 #1 0x4008ff47 in freeHostSessions (host=0x41a1ebe8, theDevice=0) at hash.c:176 #2 0x400900a8 in freeHostInfo (host=0x41a1ebe8, actualDeviceId=0) at hash.c:232 #3 0x40090bd6 in purgeIdleHosts (actDevice=0) at hash.c:545 #4 0x40097f19 in scanIdleLoop (notUsed=0x0) at ntop.c:588 #5 0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0
Other threads that reference mutexes, semaphores, anything POSIX are hung, even though the mutex/semaphore shows not locked...:
[Switching to thread 7 (Thread 81926 (LWP 27298))]#0 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 (gdb) info stack #0 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 #1 0x4024fdb8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x40252190 in __pthread_alt_lock () from /lib/i686/libpthread.so.0 #3 0x4024ed77 in pthread_mutex_lock () from /lib/i686/libpthread.so.0 #4 0x400a86fe in _accessMutex (mutexId=0x400b8ba8, where=0x4006a8df "returnHTTPPage", fileName=0x40069e3d "http.c", fileLine=2590) at util.c:1143 #5 0x4003c09e in handleHTTPrequest (from={s_addr = 3232246305}) at http.c:2590 #6 0x40068091 in handleSingleWebConnection (fdmask=0x4413aa3c) at webInterface.c:5423 #7 0x40067ed6 in handleWebConnections (notUsed=0x0) at webInterface.c:5288 #8 0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0
(gdb) thread 3 [Switching to thread 3 (Thread 16386 (LWP 27294))]#0 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 (gdb) info stac #0 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0 #1 0x4024fdb8 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x4025163b in sem_wait@@GLIBC_2.1 () from /lib/i686/libpthread.so.0 #3 0x400a8dd8 in waitSem (semId=0x400b8874) at util.c:1476 #4 0x4009d03a in dequeuePacket (notUsed=0x0) at pbuf.c:1693 #5 0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0
Oddly, ntop's extra data shows the mutex locked, but the internal flag (__m_lock) shows some weird status...
(gdb) frame 4 #4 0x400a86fe in _accessMutex (mutexId=0x400b8ba8, where=0x4006a8df "returnHTTPPage", fileName=0x40069e3d "http.c", fileLine=2590) at util.c:1143 1143 rc = pthread_mutex_lock(&(mutexId->mutex)); (gdb) list 1138 1139 strcpy(mutexId->lockAttemptFile, fileName); 1140 mutexId->lockAttemptLine=fileLine; 1141 mutexId->lockAttemptPid=myPid; 1142 1143 rc = pthread_mutex_lock(&(mutexId->mutex)); 1144 1145 pthread_mutex_lock(&stateChangeMutex); 1146 mutexId->lockAttemptFile[0] = '\0'; 1147 mutexId->lockAttemptLine=0; (gdb) print *mutexId $2 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = { __status = 1142136828, __spinlock = 0}}, isLocked = 1 '\001', isInitialized = 1 '\001', lockFile = "hash.c", '\0' <repeats 57 times>, lockLine = 465, lockPid = 27295, unlockFile = "http.c", '\0' <repeats 57 times>, unlockLine = 2618, unlockPid = 27298, numLocks = 361, numReleases = 360, lockTime = 1061844745, maxLockedDurationUnlockFile = "http.c", '\0' <repeats 57 times>, maxLockedDurationUnlockLine = 2618, maxLockedDuration = 1, where = "purgeIdleHosts", '\0' <repeats 49 times>, lockAttemptFile = "http.c", '\0' <repeats 57 times>, lockAttemptLine = 2590, lockAttemptPid = 27298}
vs. a normal LOCKED mutex:
(gdb) print myGlobals.packetProcessMutex $4 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = { __status = 1, __spinlock = 0}}, isLocked = 1 '\001', isInitialized = 1 '\001', lockFile = "pbuf.c", '\0' <repeats 57 times>, lockLine = 1591, lockPid = 2849, unlockFile = "pbuf.c", '\0' <repeats 57 times>, unlockLine = 1602, unlockPid = 2849, numLocks = 50182, numReleases = 50181, lockTime = 1061847839, maxLockedDurationUnlockFile = "pbuf.c", '\0' <repeats 57 times>, maxLockedDurationUnlockLine = 1602, maxLockedDuration = 5, where = "queuePacket\000t", '\0' <repeats 50 times>, lockAttemptFile = "\000buf.c", '\0' <repeats 57 times>, lockAttemptLine = 0, lockAttemptPid = 0}
and UNLOCKED:
(gdb) print myGlobals.purgePortsMutex $2 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = { __status = 0, __spinlock = 0}}, isLocked = 0 '\0', isInitialized = 1 '\001', lockFile = "pbuf.c", '\0' <repeats 57 times>, lockLine = 0, lockPid = 2849, unlockFile = "pbuf.c", '\0' <repeats 57 times>, unlockLine = 591, unlockPid = 2849, numLocks = 24424, numReleases = 24424, lockTime = 1061847839, maxLockedDurationUnlockFile = "ntop.c", '\0' <repeats 57 times>, maxLockedDurationUnlockLine = 572, maxLockedDuration = 1, where = "updateInterfacePorts", '\0' <repeats 43 times>, lockAttemptFile = "\000buf.c", '\0' <repeats 57 times>, lockAttemptLine = 0, lockAttemptPid = 0}
Anybody else having problems w/ 2.2.94 or another fairly recent version???
-----Burton
_______________________________________________
Ntop mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop
-- Luca Deri <[EMAIL PROTECTED]> http://luca.ntop.org/ Hacker: someone who loves to program and enjoys being clever about it - Richard Stallman
_______________________________________________ Ntop mailing list [EMAIL PROTECTED] http://listgateway.unipi.it/mailman/listinfo/ntop
