Tigger is a dual CPU (P3-1000) running RH8 (2.4.20 kernel) with two NICs for
ntop) - one WAN one LAN.
I'm seeing deadlocks, which look - under gdb - like they're occurring in the
sched_yield() call. It actually looks like it's deadlocking the POSIX
thread control thread.
If I disable the SCHED_YIELD, it seems better - at least I'm now up to 52m
of run time vs. the usual 1-10 before lockup...
Specifically, I'm seeing this when I connect to the hung ntop via gdb:
(gdb) info thread
9 Thread 114696 (LWP 27300) 0x40253c68 in recvfrom () from
/lib/i686/libpthread.so.0
8 Thread 98311 (LWP 27299) 0x4209ad41 in __tzfile_compute () from
/lib/i686/libc.so.6
7 Thread 81926 (LWP 27298) 0x40250a35 in __pthread_sigsuspend () from
/lib/i686/libpthread.so.0
6 Thread 65541 (LWP 27297) 0x420b0226 in nanosleep () from
/lib/i686/libc.so.6
5 Thread 49156 (LWP 27296) 0x40250a35 in __pthread_sigsuspend () from
/lib/i686/libpthread.so.0
4 Thread 32771 (LWP 27295) 0x420cd207 in sched_yield () from
/lib/i686/libc.so.6
3 Thread 16386 (LWP 27294) 0x40250a35 in __pthread_sigsuspend () from
/lib/i686/libpthread.so.0
2 Thread 32769 (LWP 27293) 0x420db1a7 in poll () from /lib/i686/libc.so.6
1 Thread 16384 (LWP 27290) 0x420b0226 in nanosleep () from
/lib/i686/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread 32771 (LWP 27295))]#0 0x420cd207 in
sched_yield ()
from /lib/i686/libc.so.6
(gdb) info stac
#0 0x420cd207 in sched_yield () from /lib/i686/libc.so.6
#1 0x4008ff47 in freeHostSessions (host=0x41a1ebe8, theDevice=0) at
hash.c:176
#2 0x400900a8 in freeHostInfo (host=0x41a1ebe8, actualDeviceId=0) at
hash.c:232
#3 0x40090bd6 in purgeIdleHosts (actDevice=0) at hash.c:545
#4 0x40097f19 in scanIdleLoop (notUsed=0x0) at ntop.c:588
#5 0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0
Other threads that reference mutexes, semaphores, anything POSIX are hung,
even though the mutex/semaphore shows not locked...:
[Switching to thread 7 (Thread 81926 (LWP 27298))]#0 0x40250a35 in
__pthread_sigsuspend ()
from /lib/i686/libpthread.so.0
(gdb) info stack
#0 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1 0x4024fdb8 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2 0x40252190 in __pthread_alt_lock () from /lib/i686/libpthread.so.0
#3 0x4024ed77 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4 0x400a86fe in _accessMutex (mutexId=0x400b8ba8, where=0x4006a8df
"returnHTTPPage",
fileName=0x40069e3d "http.c", fileLine=2590) at util.c:1143
#5 0x4003c09e in handleHTTPrequest (from={s_addr = 3232246305}) at
http.c:2590
#6 0x40068091 in handleSingleWebConnection (fdmask=0x4413aa3c) at
webInterface.c:5423
#7 0x40067ed6 in handleWebConnections (notUsed=0x0) at webInterface.c:5288
#8 0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0
(gdb) thread 3
[Switching to thread 3 (Thread 16386 (LWP 27294))]#0 0x40250a35 in
__pthread_sigsuspend ()
from /lib/i686/libpthread.so.0
(gdb) info stac
#0 0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1 0x4024fdb8 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2 0x4025163b in sem_wait@@GLIBC_2.1 () from /lib/i686/libpthread.so.0
#3 0x400a8dd8 in waitSem (semId=0x400b8874) at util.c:1476
#4 0x4009d03a in dequeuePacket (notUsed=0x0) at pbuf.c:1693
#5 0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0
Oddly, ntop's extra data shows the mutex locked, but the internal flag
(__m_lock) shows some weird status...
(gdb) frame 4
#4 0x400a86fe in _accessMutex (mutexId=0x400b8ba8, where=0x4006a8df
"returnHTTPPage",
fileName=0x40069e3d "http.c", fileLine=2590) at util.c:1143
1143 rc = pthread_mutex_lock(&(mutexId->mutex));
(gdb) list
1138
1139 strcpy(mutexId->lockAttemptFile, fileName);
1140 mutexId->lockAttemptLine=fileLine;
1141 mutexId->lockAttemptPid=myPid;
1142
1143 rc = pthread_mutex_lock(&(mutexId->mutex));
1144
1145 pthread_mutex_lock(&stateChangeMutex);
1146 mutexId->lockAttemptFile[0] = '\0';
1147 mutexId->lockAttemptLine=0;
(gdb) print *mutexId
$2 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind =
0, __m_lock = {
__status = 1142136828, __spinlock = 0}}, isLocked = 1 '\001',
isInitialized = 1 '\001',
lockFile = "hash.c", '\0' <repeats 57 times>, lockLine = 465, lockPid =
27295,
unlockFile = "http.c", '\0' <repeats 57 times>, unlockLine = 2618,
unlockPid = 27298,
numLocks = 361, numReleases = 360, lockTime = 1061844745,
maxLockedDurationUnlockFile = "http.c", '\0' <repeats 57 times>,
maxLockedDurationUnlockLine = 2618, maxLockedDuration = 1,
where = "purgeIdleHosts", '\0' <repeats 49 times>,
lockAttemptFile = "http.c", '\0' <repeats 57 times>, lockAttemptLine =
2590,
lockAttemptPid = 27298}
vs. a normal LOCKED mutex:
(gdb) print myGlobals.packetProcessMutex
$4 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind =
0, __m_lock = {
__status = 1, __spinlock = 0}}, isLocked = 1 '\001', isInitialized = 1
'\001',
lockFile = "pbuf.c", '\0' <repeats 57 times>, lockLine = 1591, lockPid =
2849,
unlockFile = "pbuf.c", '\0' <repeats 57 times>, unlockLine = 1602,
unlockPid = 2849,
numLocks = 50182, numReleases = 50181, lockTime = 1061847839,
maxLockedDurationUnlockFile = "pbuf.c", '\0' <repeats 57 times>,
maxLockedDurationUnlockLine = 1602, maxLockedDuration = 5,
where = "queuePacket\000t", '\0' <repeats 50 times>,
lockAttemptFile = "\000buf.c", '\0' <repeats 57 times>, lockAttemptLine =
0, lockAttemptPid = 0}
and UNLOCKED:
(gdb) print myGlobals.purgePortsMutex
$2 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind =
0, __m_lock = {
__status = 0, __spinlock = 0}}, isLocked = 0 '\0', isInitialized = 1
'\001',
lockFile = "pbuf.c", '\0' <repeats 57 times>, lockLine = 0, lockPid =
2849,
unlockFile = "pbuf.c", '\0' <repeats 57 times>, unlockLine = 591,
unlockPid = 2849,
numLocks = 24424, numReleases = 24424, lockTime = 1061847839,
maxLockedDurationUnlockFile = "ntop.c", '\0' <repeats 57 times>,
maxLockedDurationUnlockLine = 572, maxLockedDuration = 1,
where = "updateInterfacePorts", '\0' <repeats 43 times>,
lockAttemptFile = "\000buf.c", '\0' <repeats 57 times>, lockAttemptLine =
0, lockAttemptPid = 0}
Anybody else having problems w/ 2.2.94 or another fairly recent version???
-----Burton
_______________________________________________
Ntop mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop