[Ntop-dev] Mutexs - Lock Holder stuff

Burton M. Strauss III Fri, 20 Dec 2002 06:03:15 -0800

Let me clarify this...

#0 relates to who holds the lock.  My point was that the POSIX stuff only
protects the pthread_mutex item itself.  Adding the super mutex the way you
did only protects isLocked, so it's THEORETICALLY possible (but highly
unlikely) to get the other add-on values stepped on.  It's very unlikely, so
I was asking if holding the supermutex for the few extra instructions was
worth the very slight risk.  Since we don't make any processing decisions
based on the add-on values, I don't feel strongly either way.


#1 & #2 - unfortunately, you've missed the point of the lockAttempt stuff.
The possibility of the data being stepped on clearly IS there, but that's
why I defined it as a "light-weight" scheme.  It's not designed to give 100%
accurate information, but rather to elicit a place to START looking IF a
deadlock occurs.

The idea is to put a canary in the field, set the mutex and then one of two
things happens:

a. You get the mutex and clear the canary.
b. You block and the canary is set until either a) somebody else blocks or
b) the mutex clears and you unblock.

During the time you are blocked, either your canary data is in the fields
for the web server info.html/textinfo.html page to display OR somebody
else's data is, who is ALSO (probably) blocked.

The scheme isn't perfect, but for most cases, it shows you that thread(s)
are blocking and gives you a place to start looking, at a very light
processing cost.  We're not doing this for perfect information, but rather
to get a broad look at deadlocks across a lot of machines/instances at low
cost.  Sure, I could do better, but that would a) cost more per lock and b)
not give much extra info.

Normal example (time flows down)

Thread   #1           #2
         setcanary
         lock
           getslock
         clearcanary
                      setscanary
         <processes>
                      lock
                        blocks
         unlock
                        getslock
                      clearcanary
                      processes
                      unlock

Sure, any number of pathological cases COULD occur, but that's the normal
one:

During the time #2 is blocked, any web server display of
info.html/textinfo.html shows it's blocked and where.

A single instantaneous value in "blocked" isn't a deadlock.  It's just the
natural blocking of a thread waiting for a resource.  But we've now given
the user a way to see if s/he is deadlocked!  All the user has to do is
refresh info.html and see if the block cleared and/or the # of locks/unlocks
goes up.  If either of those is happening, you're fine.  If after 15 or 30s
it hasn't cleared, then ntop is deadlocked, and the user can give us a place
to START looking.

Can the canary be fooled ... sure:

Thread   #1           #2           #3
         setcanary
         lock
           getslock
         clearcanary
                      setscanary
                                   setscanary
         <processes>
                      lock
                        blocks
                                   lock
                                     blocks           POINT A
         unlock
                                     getslock
                                   clearcanary        POINT B
                                   processes
                                   unlock
                        getslock
                      clearcanary
                      processes
                      unlock


At point A, the info.html correctly reports thread #3 blocked, although
thread #2 is really the one that blocked first, it's data is overlaid.
Still, if you go looking for where thread #3 blocks on thread #1 (because
you know who holds the lock from the isLocked data), you're in the right
ball park.

At point B, the canary data is cleared even though you're still blocked on
thread #2.

And that's the potential flaw - the scheme is too lightweight for the 3
thread case.  But it's good enough for most uses.


-----Burton



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf
Of Luca Deri
Sent: Friday, December 20, 2002 2:46 AM
To: [EMAIL PROTECTED]
Subject: Re: [Ntop-dev] New ntop commit (author deri)


Burton M. Strauss III wrote:

>I understand WHAT you're trying to do, but I have some questions....
>
>0) Single threading ntop on the isLocked setting should prevent us from
>stepping on it, but do we want to expand the protected area to be all of
the
>lock values?  It's theoretically possible to get the other value stepped
on,
>although I'll grant it's probably impossible in real life...
>
>
I can't follow you here. Can you elaborate please?

>
>1) it looks like you've butchered the lockAttempt stuff...
>
>You moved the setting of the field AFTER the lock of the mutex.  The point
>of the change was to load the field BEFORE, so that if the
>pthread_mutex_lock() blocked, the lockAttempt (blocked) stuff was set and
>could be retrieved by other threads (e.g. the web server for info.html).
>
>The code in _accessMutex() should look like this:
>
>  strcpy(mutexId->lockAttemptFile, fileName);
>  mutexId->lockAttemptLine=fileLine;
>
>  rc = pthread_mutex_lock(&(mutexId->mutex));
>
>  mutexId->lockAttemptFile[0] = '\0';
>  mutexId->lockAttemptLine=0;
>
>
>2) Why did you move the self locked mutex test?
>
>According to the POSIX standard, it is not required to report errors such
as
>attempting to lock a mutex you've already locked.  In fact, just about ANY
>behavior is possible - so I wanted the log message to occur BEFORE the code
>(potentially) deadlocked.
>
>http://www.opengroup.org/onlinepubs/7908799/xsh/threads.html and
>specifically
>http://www.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html
>
>"If the mutex type is PTHREAD_MUTEX_NORMAL, deadlock detection is not
>provided. Attempting to relock the mutex causes deadlock. If a thread
>attempts to unlock a mutex that it has not locked or a mutex which is
>unlocked, undefined behavior results.
>
>If the mutex type is PTHREAD_MUTEX_ERRORCHECK, then error checking is
>provided. If a thread attempts to relock a mutex that it has already
locked,
>an error will be returned. If a thread attempts to unlock a mutex that it
>has not locked or a mutex which is unlocked, an error will be returned.
>
>If the mutex type is PTHREAD_MUTEX_RECURSIVE, then the mutex maintains the
>concept of a lock count. When a thread successfully acquires a mutex for
the
>first time, the lock count is set to one. Every time a thread relocks this
>mutex, the lock count is incremented by one. Each time the thread unlocks
>the mutex, the lock count is decremented by one. When the lock count
reaches
>zero, the mutex becomes available for other threads to acquire. If a thread
>attempts to unlock a mutex that it has not locked or a mutex which is
>unlocked, an error will be returned.
>
>If the mutex type is PTHREAD_MUTEX_DEFAULT, attempting to recursively lock
>the mutex results in undefined behavior. Attempting to unlock the mutex if
>it was not locked by the calling thread results in undefined behavior.
>Attempting to unlock the mutex if it is not locked results in undefined
>behavior."
>
>Since we init w/ NULL (PTHREAD_MUTEX_DEFAULT) behavior COULD be
>PTHREAD_MUTEX_NORMAL which would deadlock.  I don't think this would cause
a
>problem either way with any of our normal environments, but with two new
>candidate gcc thread libraries on the horizon, plus this being an undefined
>behavior situation, I wanted to make ntop's stuff clear.
>
>
>
Mutexes are used because two or more concurrent threads try to access
the same resource. This means that those threads can all access at the
same time the same resource. If two threads are calling concurrently
_accessMutex() they will both change mutexId->lockAttempt*, the last
thread will overwrite  the values, but only the first thread will
actually get the mutex as the second one (the one that overwrote the
value) will be stuck at pthread_mutex_lock. Therefore:
1. modify values (e.g. mutexId->lockAttempt*) only when it's time to do
it (-> when you've got the lock on the mutex)
2. do not trust values such as mutexId->isLocked unless you're sure that
the value does not change. For instance if you write:

if(mutexId->isLocked) {
    print "Error!"
}
lockMutex(...)

it's a wrong statement *unless* you have a super-mutex that makes sure
that the isLocked values does not change

Bottom line: you've done a great work with mutexes. I have just fixed some
issues.


Regards, Luca

>
>I have a patch I was working on to add the PID# to the reports and lock
>data.  I'll include the fixes for #1 and #2 above in that fix (#181).  #0
>I'll leave until I hear from you...
>
>
>-----Burton
>
>





>
>
>-----Original Message-----
>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf
>Of [EMAIL PROTECTED]
>Sent: Thursday, December 19, 2002 4:15 AM
>To: [EMAIL PROTECTED]
>Subject: [Ntop-dev] New ntop commit (author deri)
>
>
>Update of /export/home/ntop/ntop
>In directory jabber:/tmp/cvs-serv5092
>
>Modified Files:
>       hash.c ntop.c sessions.c util.c webInterface.c
>Log Message:
>Added fixes for mutex warnings. A new "super mutex" named stateChangeMutex
>has been added in order to serialize activities on mutexes.
>
>
>
>_______________________________________________
>Ntop-dev mailing list
>[EMAIL PROTECTED]
>http://listgateway.unipi.it/mailman/listinfo/ntop-dev
>
>_______________________________________________
>Ntop-dev mailing list
>[EMAIL PROTECTED]
>http://listgateway.unipi.it/mailman/listinfo/ntop-dev
>
>


--
Luca Deri <[EMAIL PROTECTED]>       http://luca.ntop.org/
Hacker: someone who loves to program and enjoys being
clever about it - Richard Stallman


_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

_______________________________________________
Ntop-dev mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

[Ntop-dev] Mutexs - Lock Holder stuff

Reply via email to