[Ntop-dev] RES: RES: RES: Ntop processing only one packet

Rafael Sarres de Almeida Fri, 10 Sep 2010 15:17:24 -0700

Hi all;
Here are more things that I discovered. I am quite sure that this is a MUTEX 
problem. I analyzed the behavior of the 3.3.10 code and the 4.0.1 code in GDB, 
so I could see clearly that the MUTEX is not being locked on 4.0.1:


GDB NTOP 4.0.1 does not lock mutex:

Breakpoint 1, queuePacket (_deviceId=<value optimized out>, h=0x45007050, 
p=0x2aaaad590042 "") at pbuf.c:2510
2510      if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex, 
"queuePacket") == 0) {
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$1 = {isInitialized = 1 '\001', mutex = {__data = {__lock = 0, __count = 0, 
__owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
      __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 
times>, __align = 0}}


(gdb) next
Fri Sep 10 18:53:21 2010  THREADMGMT[t1094719808]: SIH: Idle host scan thread 
running [p21033]
Fri Sep 10 18:53:21 2010  THREADMGMT[t1084229952]: SFP: Fingerprint scan thread 
running [p21033]
2514        myGlobals.receivedPacketsProcessed++;
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$2 = {isInitialized = 1 '\001', mutex = {__data = {__lock = 0, __count = 0, 
__owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
      __list = {__prev = 0x523d, __next = 0x0}}, __size = '\000' <repeats 24 
times>, "=R", '\000' <repeats 13 times>, __align = 0}}
(gdb) 

****************You can see that {__lock = 0 is not set after calling 
tryLockMutex.


GDB NTOP 3.3.10 locks mutex

Breakpoint 1, queuePacket (_deviceId=<value optimized out>, h=0x42802e50, 
p=0x2aaaae08b042 "") at pbuf.c:2476
2476      if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex, 
"queuePacket") == 0) {
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$1 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, 
__kind = 0, __spins = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
statedatamutex = {__data = {__lock = 0, __count = 0, 
      __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 
0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
    __align = 0}, isLocked = 0 '\000', isInitialized = 1 '\001', numLocks = 0, 
numReleases = 0, attempt = {time = {tv_sec = 0, 
      tv_usec = 0}, pid = 0, thread = 0, line = 0, file = "\000\000\000\000"}, 
lock = {time = {tv_sec = 0, tv_usec = 0}, pid = 0, 
    thread = 0, line = 0, file = "\000\000\000\000"}, unlock = {time = {tv_sec 
= 0, tv_usec = 0}, pid = 0, thread = 0, line = 0, 
    file = "\000\000\000\000"}, max = {time = {tv_sec = 0, tv_usec = 0}, pid = 
0, thread = 0, line = 0, file = "\000\000\000\000"}, 
  maxLockedDuration = 0}


(gdb) next
Fri Sep 10 18:55:09 2010  THREADMGMT[t1084229952]: SFP: Fingerprint scan thread 
running [p25640]
2480        myGlobals.receivedPacketsProcessed++;
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$2 = {mutex = {__data = {__lock = 1, __count = 0, __owner = 25653, __nusers = 
1, __kind = 0, __spins = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = 
"\001\000\000\000\000\000\000\000\065d\000\000\001", '\000' <repeats 26 times>, 
__align = 1}, 
  statedatamutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 
0, __kind = 0, __spins = 0, __list = {__prev = 0x0, 
        __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
isLocked = 1 '\001', isInitialized = 1 '\001', numLocks = 1, 
  numReleases = 0, attempt = {time = {tv_sec = 0, tv_usec = 0}, pid = 0, thread 
= 0, line = 0, file = "\000\000\000\000"}, lock = {
    time = {tv_sec = 0, tv_usec = 0}, pid = 0, thread = 0, line = 0, file = 
"\000\000\000\000"}, unlock = {time = {tv_sec = 0, 
      tv_usec = 0}, pid = 0, thread = 0, line = 0, file = "\000\000\000\000"}, 
max = {time = {tv_sec = 0, tv_usec = 0}, pid = 0, 
    thread = 0, line = 0, file = "\000\000\000\000"}, maxLockedDuration = 0}
(gdb) 

****************You can see that {__lock = 1 is set after calling tryLockMutex.



After that, I recompiled the code more carefully and found some compile 
WARNINGS that I have ignored the first time: 

util.c: In function '_deleteMutex':
util.c:1927: warning: passing argument 1 of 'pthread_rwlock_destroy' from 
incompatible pointer type
util.c: In function '_tryLockMutex':
util.c:2078: warning: passing argument 1 of 'pthread_rwlock_trywrlock' from 
incompatible pointer type


Well, the good thing is that I think the problem is found. The bad thing is 
that I checked the pointer type and it seems right: 
For "int pthread_rwlock_trywrlock(pthread_rwlock_t *rwlock)" is passed 
&mutexId->mutex, and mutex is pthread_rwlock_t type. I am not an expert C++ 
programmer but I think it is right.

Well, if is anybody still reading this, any ideas? Maybe a compiler problem? 
Even if no one is out there, I hope this info can help a future wanderer 
through the ntop-dev archives....  :)

Thanks.
        Rafael Sarres de Almeida 
        


-----Mensagem original-----
De: [email protected] 
[mailto:[email protected]] Em nome de Luca Deri
Enviada em: terça-feira, 31 de agosto de 2010 06:03
Para: [email protected]
Assunto: Re: [Ntop-dev] RES: RES: Ntop processing only one packet

Rafael
thanks for debugging the code. The software works as follows:
- are we the only one processing packets? If so (i.e. no other threads 
are doing this) then process the packet immediately. This turns into

  if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex, 
"queuePacket") == 0) {
     /* Locked so we can process the packet now */
.....
   processPacket(_deviceId, h, p1);
     releaseMutex(&myGlobals.device[deviceId].packetProcessMutex);
     return;
}

- if another thread is processing packets already, we need to queue the 
packet
  /*
     If we reach this point it means that somebody was already
     processing a packet so we need to queue it.
   */
   if(myGlobals.device[deviceId].packetQueueLen >= 
CONST_PACKET_QUEUE_LENGTH) {
...
}

In this second case ntop notifies the dequeue thread that there's a 
packet to process

  signalCondvar(&myGlobals.device[deviceId].queueCondvar);

Now my question is: are you sure that for some reason the dequeue thread 
isn't looping or isn't really awake? Can you please check what happens 
in (pbuf.c)

void* dequeuePacket(void* _deviceId) {
}

Just enable the traces (around #ifdef DEBUG) to see what happens there.

Cheers Luca



On 08/30/2010 11:26 PM, Rafael Sarres de Almeida wrote:
> Hi Luca;
> Just to add more info to my previous mail:
> I gdb the code in the first packet, seems like that the releaseMutex 
> (2538,pbuf.c) is not releasing. I followed the code, it calls the 
> realeaseMutex function after it processes the first packet, but on the next 
> loop, the tryLockMutex (2510,pbuf.c) fails, so the program thinks the mutex 
> is not released. Here is the debug:
>
>
> Breakpoint 4, queuePacket (_deviceId=<value optimized out>, h=0x45007050, 
> p=0x2aaaad590042 "") at pbuf.c:2510
> 2510      if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex, 
> "queuePacket") == 0) {
>
> *********** It is going to process first packet if Mutex is not locked.
>
>
> (gdb) step
> Mon Aug 30 18:09:35 2010  THREADMGMT[t1094719808]: SIH: Idle host scan thread 
> running [p12154]
> _tryLockMutex (mutexId=0x2aaaab7e1150, where=0x2aaaaad8d59d "queuePacket", 
> fileName=0x2aaaaad8d41a "pbuf.c", fileLine=2510)
>      at util.c:2078
> 2078      return(pthread_rwlock_trywrlock(&mutexId->mutex));
> (gdb)
> [New Thread 0x45a08940 (LWP 12189)]
> Mon Aug 30 18:09:42 2010  THREADMGMT[t1168148800]: RRD: Started thread for 
> throughput data collection
> Mon Aug 30 18:09:42 2010  THREADMGMT[t1147169088]: RRD: Data collection 
> thread running [p12154]
> Mon Aug 30 18:09:42 2010  THREADMGMT[t1168148800]: RRD: Throughput data 
> collection: Thread starting [p12154]
> Mon Aug 30 18:09:42 2010  THREADMGMT[t1168148800]: RRD: Throughput data 
> collection: Thread running [p12154]
> 0x000000304fa0a760 in pthread_rwlock_trywrlock () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function pthread_rwlock_trywrlock,
> which has no line number information.
> queuePacket (_deviceId=<value optimized out>, h=0x45007050, p=0x2aaaad590042 
> "") at pbuf.c:2514
> 2514        myGlobals.receivedPacketsProcessed++;
> (gdb) break 2538
> Breakpoint 5 at 0x2aaaaad595a7: file pbuf.c, line 2538.
> (gdb) continue
> Continuing.
>
> Breakpoint 5, queuePacket (_deviceId=<value optimized out>, h=0x45007050, 
> p=0x2aaaad590042 "") at pbuf.c:2538
> 2538        releaseMutex(&myGlobals.device[deviceId].packetProcessMutex);
>
> ***************Releasing MUTEX
>
> (gdb) step
> _releaseMutex (mutexId=0x2aaaab7e1150, fileName=0x2aaaaad8d41a "pbuf.c", 
> fileLine=2538) at util.c:2156
> 2156      return(pthread_rwlock_unlock(&mutexId->mutex));
> (gdb)
> 0x000000304eedfa10 in pthread_mutex_unlock () from /lib64/libc.so.6
> (gdb)
> Single stepping until exit from function pthread_mutex_unlock,
> which has no line number information.
> 0x000000304fa0a020 in pthread_mutex_unlock () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function pthread_mutex_unlock,
> which has no line number information.
> 0x000000304fa0a0d8 in _L_unlock_766 () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function _L_unlock_766,
> which has no line number information.
> 0x000000304fa0d5e0 in __lll_unlock_wake () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function __lll_unlock_wake,
> which has no line number information.
> 0x000000304fa0a0e7 in _L_unlock_766 () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function _L_unlock_766,
> which has no line number information.
> 0x000000304fa0a04e in pthread_mutex_unlock () from /lib64/libpthread.so.0
> (gdb)
>
>
> Any ideas?
>
>          Rafael Sarres de Almeida
>
>    

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev
_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

[Ntop-dev] RES: RES: RES: Ntop processing only one packet

Reply via email to