(Yeah, this message is in html.  I don't usually do it, but for showing an example of an hmtl page, it's the easiest way...)

For tracking down deadlocks, I've added this new page http://127.0.0.1:3000/showMutex.html to the version in the ntop cvs.  (previously, the mutex report was only generated if the -K flag was set, which has other consequences).

Note that just because something is 'blocked' for an instant doesn't imply a deadlock.  That is COMPLETELY NORMAL behavior.

The normal output looks like this:

Mutex Name State Last Lock Blocked Last UnLock # Locks/Releases Max Lock
gdbmMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST util.c:3386(13349)   util.c:3393(13349) 100 100 0 sec [util.c:3413]
packetProcessMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST pbuf.c:1631(13350)   pbuf.c:1646(13350) 477 477 1 sec [pbuf.c:1646]
packetQueueMutex unlocked at Fri 31 Oct 2003 10:53:37 AM CST pbuf.c:1744(13344)   pbuf.c:1764(13344) 144 144 0 sec [pbuf.c:1689]
purgeMutex locked at Fri 31 Oct 2003 10:53:41 AM CST http.c:2584(13349)   :0(0) 1 0 0 sec [:0]
hostsHashMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST pbuf.c:1998(13350)   pbuf.c:2703(13350) 477 477 1 sec [pbuf.c:2703]
tcpSessionsMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST sessions.c:584(13350)   sessions.c:1975(13350) 30 30 0 sec [sessions.c:1975]
purgePortsMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST pbuf.c:597(13350)   pbuf.c:618(13350) 57 57 0 sec [pbuf.c:618]
securityItemsMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST http.c:2210(13349)   http.c:2223(13349) 2 2 0 sec [http.c:2205]

  The give-away of a deadlock would be a line like this:

Mutex Name State Last Lock Blocked Last UnLock # Locks/Releases Max Lock
gdbmMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST util.c:3386(13349)   util.c:3393(13349) 100 100 0 sec [util.c:3413]
packetProcessMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST pbuf.c:1631(13350)   pbuf.c:1646(13350) 477 477 1 sec [pbuf.c:1646]
packetQueueMutex unlocked at Fri 31 Oct 2003 10:53:37 AM CST pbuf.c:1744(13344)   pbuf.c:1764(13344) 144 144 0 sec [pbuf.c:1689]
purgeMutex locked at Fri 31 Oct 2003 10:53:41 AM CST http.c:2584(13349)   :0(0) 1 0 0 sec [:0]
hostsHashMutex locked at Fri 31 Oct 2003 10:52:17 AM CST hash.c:552(13350) pbuf.c:1998 (13355)  hash.c:601 (13350) 477 477 1 sec [pbuf.c:2703]
tcpSessionsMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST sessions.c:584(13350)   sessions.c:1975(13350) 30 30 0 sec [sessions.c:1975]
purgePortsMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST pbuf.c:597(13350)   pbuf.c:618(13350) 57 57 0 sec [pbuf.c:618]
securityItemsMutex unlocked at Fri 31 Oct 2003 10:53:41 AM CST http.c:2210(13349)   http.c:2223(13349) 2 2 0 sec [http.c:2205]

See how the 'at' tag is OLD and there's something in the blocked line?  If you refresh a few times over 30s or a minute and that doesn't change, you're deadlocked.

The KEY to reporting these is to go into the source and pull up the actual lines - plus some context - where the Last Lock, Last UnLock and Block show up.  (Since the cvs changes often, line #s aren't very useful without the corresponding code).

For example, the report of the above should be:

Deadlock (web server active), showMutex.html shows hostsHashMutex, locked at hash.c:552:

  for(idx=0; idx<myGlobals.device[actDevice].actualHashSize; idx++) {
#ifdef CFG_MULTITHREADED
    accessMutex(&myGlobals.hostsHashMutex, "scanIdleLoop");
#endif

Last Unlock at hash.c:601:

      if(numHosts >= (maxHosts-1)) break;

Blocked at pbuf.c:1998:

    incrementTrafficCounter(&myGlobals.device[actualDeviceId].rcvdPktStats.tooLong, 1);
  }

#ifdef CFG_MULTITHREADED
  accessMutex(&myGlobals.hostsHashMutex, "processPacket");
#endif

#ifdef DEBUG
  traceEvent(CONST_TRACE_INFO, "actualDeviceId = %d", actualDeviceId);
#endif

 

-----Burton

 

Reply via email to