[Ntop-dev] New ntop commit (author burton)

cvs-commit Wed, 16 Mar 2005 08:34:34 -0800

Update of /export/home/ntop/ntop
In directory unknown:/tmp/cvs-serv532

Modified Files:
        util.c 
Log Message:


At long last, the fix for the "self-LOCKED" mutex problem.

First off, as we suspected, it's not a problem, just an artifact of
the 'extra information' processes.  But it took months of work to
instrument the code and trap a problem case.

To 'fail' requires that ntop be monitoring more than one interface
(hence multiple pcap listening threads), AND packets have to be
processed (with mutex locks) in this pattern:

 A B A

with a context switch between threads occuring in a very narrow block 
of code (precisely TWO lines), specifically in this (slightly condensed) code
from _accessMutex (util.c):

  rc = pthread_mutex_lock(&(mutexId->mutex));

  pthread_mutex_lock(&stateChangeMutex);
  if(!myGlobals.runningPref.disableMutexExtraInfo) {
...
  }

  if(rc != 0)
    traceEvent(CONST_TRACE_ERROR, "accessMutex() call '%s' failed (rc=%d) 
[EMAIL PROTECTED]:%d]",
               where, rc, (void*)&(mutexId->mutex), fileName, fileLine);
  else {
    mutexId->numLocks++;
    mutexId->isLocked = 1;
>>HERE<<
    if(!myGlobals.runningPref.disableMutexExtraInfo) {
      mutexId->lockTime = time(NULL);
>>TOHERE<<
      mutexId->lockPid  = myPid;
...
    }
  }
  pthread_mutex_unlock(&stateChangeMutex);

The spurious message occurs when the mutex lock code checks the state of 
the extra data in the 2nd A lock, it finds the old A data, but finds isLocked 
as set by B.  And reports it as 'self-LOCKED'.

(A context switch before HERE means that isLocked isn't set, and one after
TOHERE means that the lock data has begin to be changed so it's not seen as
'self').

It's pretty rare, but it can happen.  Most often w/ SMP, but SMP is not required
to exhibit the behavior.


This fix does three things:

 * Moves the stored data into a struct and store only part of the filename.
   (this by itself is enough to fix most of the observed reports)
   Also, using a memcpy() on the struct instead of 4 separate moves.

 * Replace the global stateChangeMutex by a per-ntopMutex one.
   (This reduces some of the single threading problems)

 * Reworking the code so that the new statedatamutex is locked WHENEVER
   the state data is examined or updated.

The latter necesitated a 'trick' in the _accessMutex() code to prevent
deadlock.  The pthread_mutex_lock() call is replaced by this:

  /* Do the try first so we can free the statedatamutex if we're going to lock 
*/
  if((rc = pthread_mutex_trylock(&(mutexId->mutex))) == EBUSY) {
    pthread_mutex_unlock(&(mutexId->statedatamutex));
    rc = pthread_mutex_lock(&(mutexId->mutex));
    pthread_mutex_lock(&(mutexId->statedatamutex));
  }

So we do NOT hold the statedatamutex while waiting for the main lock!


The mutex report lines will now consist of this:

Tue 15 Mar 2005 12:21:30 PM CST
util:4336 p:16100 t:68225968

^^^^ - file name, first 3 letters (unique in all programs using mutexes)
     ^^^^ - line number
            ^^^^^ pid
                    ^^^^^^^^ thread number


Note that there remains a tiny problem window ... It becomes theoretically
possible for an attempt or a release to occur between the lock of the mutex 
and the lock of statedatamutex.

HOWEVER: We already hold the main mutex. So:

  A _releaseMutex() call shouldn't be made by this thread (we haven't 
      returned from the _accessMutex()/_tryLockMutex() yet!).

  A _releaseMutex() call by another thread is a bug anyway (this thread
      has the mutex)

  An _accessMutex() call by another thread will block (as it should)

  A _tryLockMutex() call by another thread will fail (also as it should).

With the last two, the 'attempt' state change data will be updated. If
that occurs, when our _accessMutex()/_tryLockMutex() clears the
'attempt' data, it will be lost.  That's pretty trivial.

There is no way around this since POSIX  does not provide support 
for extending mutexes (i.e. there is no way to make the two locks
atomic). 

-----Burton (ref 555)



_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

[Ntop-dev] New ntop commit (author burton)

Reply via email to