Tigger is a dual CPU (P3-1000) running RH8 (2.4.20 kernel) with two NICs for
ntop) - one WAN one LAN.

I'm seeing deadlocks, which look - under gdb - like they're occurring in the
sched_yield() call.  It actually looks like it's deadlocking the POSIX
thread control thread.

If I disable the SCHED_YIELD, it seems better - at least I'm now up to 52m
of run time vs. the usual 1-10 before lockup...

Specifically, I'm seeing this when I connect to the hung ntop via gdb:

(gdb) info thread
  9 Thread 114696 (LWP 27300)  0x40253c68 in recvfrom () from
/lib/i686/libpthread.so.0
  8 Thread 98311 (LWP 27299)  0x4209ad41 in __tzfile_compute () from
/lib/i686/libc.so.6
  7 Thread 81926 (LWP 27298)  0x40250a35 in __pthread_sigsuspend () from
/lib/i686/libpthread.so.0
  6 Thread 65541 (LWP 27297)  0x420b0226 in nanosleep () from
/lib/i686/libc.so.6
  5 Thread 49156 (LWP 27296)  0x40250a35 in __pthread_sigsuspend () from
/lib/i686/libpthread.so.0
  4 Thread 32771 (LWP 27295)  0x420cd207 in sched_yield () from
/lib/i686/libc.so.6
  3 Thread 16386 (LWP 27294)  0x40250a35 in __pthread_sigsuspend () from
/lib/i686/libpthread.so.0
  2 Thread 32769 (LWP 27293)  0x420db1a7 in poll () from /lib/i686/libc.so.6
  1 Thread 16384 (LWP 27290)  0x420b0226 in nanosleep () from
/lib/i686/libc.so.6

(gdb) thread 4
[Switching to thread 4 (Thread 32771 (LWP 27295))]#0  0x420cd207 in
sched_yield ()
   from /lib/i686/libc.so.6
(gdb) info stac
#0  0x420cd207 in sched_yield () from /lib/i686/libc.so.6
#1  0x4008ff47 in freeHostSessions (host=0x41a1ebe8, theDevice=0) at
hash.c:176
#2  0x400900a8 in freeHostInfo (host=0x41a1ebe8, actualDeviceId=0) at
hash.c:232
#3  0x40090bd6 in purgeIdleHosts (actDevice=0) at hash.c:545
#4  0x40097f19 in scanIdleLoop (notUsed=0x0) at ntop.c:588
#5  0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0


Other threads that reference mutexes, semaphores, anything POSIX are hung,
even though the mutex/semaphore shows not locked...:

[Switching to thread 7 (Thread 81926 (LWP 27298))]#0  0x40250a35 in
__pthread_sigsuspend ()
   from /lib/i686/libpthread.so.0
(gdb) info stack
#0  0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x4024fdb8 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x40252190 in __pthread_alt_lock () from /lib/i686/libpthread.so.0
#3  0x4024ed77 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4  0x400a86fe in _accessMutex (mutexId=0x400b8ba8, where=0x4006a8df
"returnHTTPPage",
    fileName=0x40069e3d "http.c", fileLine=2590) at util.c:1143
#5  0x4003c09e in handleHTTPrequest (from={s_addr = 3232246305}) at
http.c:2590
#6  0x40068091 in handleSingleWebConnection (fdmask=0x4413aa3c) at
webInterface.c:5423
#7  0x40067ed6 in handleWebConnections (notUsed=0x0) at webInterface.c:5288
#8  0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0

(gdb) thread 3
[Switching to thread 3 (Thread 16386 (LWP 27294))]#0  0x40250a35 in
__pthread_sigsuspend ()
   from /lib/i686/libpthread.so.0
(gdb) info stac
#0  0x40250a35 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x4024fdb8 in __pthread_wait_for_restart_signal () from
/lib/i686/libpthread.so.0
#2  0x4025163b in sem_wait@@GLIBC_2.1 () from /lib/i686/libpthread.so.0
#3  0x400a8dd8 in waitSem (semId=0x400b8874) at util.c:1476
#4  0x4009d03a in dequeuePacket (notUsed=0x0) at pbuf.c:1693
#5  0x4024e881 in pthread_start_thread () from /lib/i686/libpthread.so.0


Oddly, ntop's extra data shows the mutex locked, but the internal flag
(__m_lock) shows some weird status...

(gdb) frame 4
#4  0x400a86fe in _accessMutex (mutexId=0x400b8ba8, where=0x4006a8df
"returnHTTPPage",
    fileName=0x40069e3d "http.c", fileLine=2590) at util.c:1143
1143      rc = pthread_mutex_lock(&(mutexId->mutex));
(gdb) list
1138
1139      strcpy(mutexId->lockAttemptFile, fileName);
1140      mutexId->lockAttemptLine=fileLine;
1141      mutexId->lockAttemptPid=myPid;
1142
1143      rc = pthread_mutex_lock(&(mutexId->mutex));
1144
1145      pthread_mutex_lock(&stateChangeMutex);
1146      mutexId->lockAttemptFile[0] = '\0';
1147      mutexId->lockAttemptLine=0;
(gdb) print *mutexId
$2 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind =
0, __m_lock = {
      __status = 1142136828, __spinlock = 0}}, isLocked = 1 '\001',
isInitialized = 1 '\001',
  lockFile = "hash.c", '\0' <repeats 57 times>, lockLine = 465, lockPid =
27295,
  unlockFile = "http.c", '\0' <repeats 57 times>, unlockLine = 2618,
unlockPid = 27298,
  numLocks = 361, numReleases = 360, lockTime = 1061844745,
  maxLockedDurationUnlockFile = "http.c", '\0' <repeats 57 times>,
  maxLockedDurationUnlockLine = 2618, maxLockedDuration = 1,
  where = "purgeIdleHosts", '\0' <repeats 49 times>,
  lockAttemptFile = "http.c", '\0' <repeats 57 times>, lockAttemptLine =
2590,
  lockAttemptPid = 27298}

vs. a normal LOCKED mutex:

(gdb) print myGlobals.packetProcessMutex
$4 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind =
0, __m_lock = {
      __status = 1, __spinlock = 0}}, isLocked = 1 '\001', isInitialized = 1
'\001',
  lockFile = "pbuf.c", '\0' <repeats 57 times>, lockLine = 1591, lockPid =
2849,
  unlockFile = "pbuf.c", '\0' <repeats 57 times>, unlockLine = 1602,
unlockPid = 2849,
  numLocks = 50182, numReleases = 50181, lockTime = 1061847839,
  maxLockedDurationUnlockFile = "pbuf.c", '\0' <repeats 57 times>,
  maxLockedDurationUnlockLine = 1602, maxLockedDuration = 5,
  where = "queuePacket\000t", '\0' <repeats 50 times>,
  lockAttemptFile = "\000buf.c", '\0' <repeats 57 times>, lockAttemptLine =
0, lockAttemptPid = 0}


and UNLOCKED:

(gdb) print myGlobals.purgePortsMutex
$2 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind =
0, __m_lock = {
      __status = 0, __spinlock = 0}}, isLocked = 0 '\0', isInitialized = 1
'\001',
  lockFile = "pbuf.c", '\0' <repeats 57 times>, lockLine = 0, lockPid =
2849,
  unlockFile = "pbuf.c", '\0' <repeats 57 times>, unlockLine = 591,
unlockPid = 2849,
  numLocks = 24424, numReleases = 24424, lockTime = 1061847839,
  maxLockedDurationUnlockFile = "ntop.c", '\0' <repeats 57 times>,
  maxLockedDurationUnlockLine = 572, maxLockedDuration = 1,
  where = "updateInterfacePorts", '\0' <repeats 43 times>,
  lockAttemptFile = "\000buf.c", '\0' <repeats 57 times>, lockAttemptLine =
0, lockAttemptPid = 0}



Anybody else having problems w/ 2.2.94 or another fairly recent version???

-----Burton

_______________________________________________
Ntop mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop

Reply via email to