Hi all;
Here are more things that I discovered. I am quite sure that this is a MUTEX
problem. I analyzed the behavior of the 3.3.10 code and the 4.0.1 code in GDB,
so I could see clearly that the MUTEX is not being locked on 4.0.1:
GDB NTOP 4.0.1 does not lock mutex:
Breakpoint 1, queuePacket (_deviceId=<value optimized out>, h=0x45007050,
p=0x2aaaad590042 "") at pbuf.c:2510
2510 if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex,
"queuePacket") == 0) {
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$1 = {isInitialized = 1 '\001', mutex = {__data = {__lock = 0, __count = 0,
__owner = 0, __nusers = 0, __kind = 0, __spins = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39
times>, __align = 0}}
(gdb) next
Fri Sep 10 18:53:21 2010 THREADMGMT[t1094719808]: SIH: Idle host scan thread
running [p21033]
Fri Sep 10 18:53:21 2010 THREADMGMT[t1084229952]: SFP: Fingerprint scan thread
running [p21033]
2514 myGlobals.receivedPacketsProcessed++;
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$2 = {isInitialized = 1 '\001', mutex = {__data = {__lock = 0, __count = 0,
__owner = 0, __nusers = 0, __kind = 0, __spins = 0,
__list = {__prev = 0x523d, __next = 0x0}}, __size = '\000' <repeats 24
times>, "=R", '\000' <repeats 13 times>, __align = 0}}
(gdb)
****************You can see that {__lock = 0 is not set after calling
tryLockMutex.
GDB NTOP 3.3.10 locks mutex
Breakpoint 1, queuePacket (_deviceId=<value optimized out>, h=0x42802e50,
p=0x2aaaae08b042 "") at pbuf.c:2476
2476 if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex,
"queuePacket") == 0) {
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$1 = {mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0},
statedatamutex = {__data = {__lock = 0, __count = 0,
__owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev =
0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,
__align = 0}, isLocked = 0 '\000', isInitialized = 1 '\001', numLocks = 0,
numReleases = 0, attempt = {time = {tv_sec = 0,
tv_usec = 0}, pid = 0, thread = 0, line = 0, file = "\000\000\000\000"},
lock = {time = {tv_sec = 0, tv_usec = 0}, pid = 0,
thread = 0, line = 0, file = "\000\000\000\000"}, unlock = {time = {tv_sec
= 0, tv_usec = 0}, pid = 0, thread = 0, line = 0,
file = "\000\000\000\000"}, max = {time = {tv_sec = 0, tv_usec = 0}, pid =
0, thread = 0, line = 0, file = "\000\000\000\000"},
maxLockedDuration = 0}
(gdb) next
Fri Sep 10 18:55:09 2010 THREADMGMT[t1084229952]: SFP: Fingerprint scan thread
running [p25640]
2480 myGlobals.receivedPacketsProcessed++;
(gdb) print myGlobals.device[deviceId].packetProcessMutex
$2 = {mutex = {__data = {__lock = 1, __count = 0, __owner = 25653, __nusers =
1, __kind = 0, __spins = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size =
"\001\000\000\000\000\000\000\000\065d\000\000\001", '\000' <repeats 26 times>,
__align = 1},
statedatamutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers =
0, __kind = 0, __spins = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0},
isLocked = 1 '\001', isInitialized = 1 '\001', numLocks = 1,
numReleases = 0, attempt = {time = {tv_sec = 0, tv_usec = 0}, pid = 0, thread
= 0, line = 0, file = "\000\000\000\000"}, lock = {
time = {tv_sec = 0, tv_usec = 0}, pid = 0, thread = 0, line = 0, file =
"\000\000\000\000"}, unlock = {time = {tv_sec = 0,
tv_usec = 0}, pid = 0, thread = 0, line = 0, file = "\000\000\000\000"},
max = {time = {tv_sec = 0, tv_usec = 0}, pid = 0,
thread = 0, line = 0, file = "\000\000\000\000"}, maxLockedDuration = 0}
(gdb)
****************You can see that {__lock = 1 is set after calling tryLockMutex.
After that, I recompiled the code more carefully and found some compile
WARNINGS that I have ignored the first time:
util.c: In function '_deleteMutex':
util.c:1927: warning: passing argument 1 of 'pthread_rwlock_destroy' from
incompatible pointer type
util.c: In function '_tryLockMutex':
util.c:2078: warning: passing argument 1 of 'pthread_rwlock_trywrlock' from
incompatible pointer type
Well, the good thing is that I think the problem is found. The bad thing is
that I checked the pointer type and it seems right:
For "int pthread_rwlock_trywrlock(pthread_rwlock_t *rwlock)" is passed
&mutexId->mutex, and mutex is pthread_rwlock_t type. I am not an expert C++
programmer but I think it is right.
Well, if is anybody still reading this, any ideas? Maybe a compiler problem?
Even if no one is out there, I hope this info can help a future wanderer
through the ntop-dev archives.... :)
Thanks.
Rafael Sarres de Almeida
-----Mensagem original-----
De: [email protected]
[mailto:[email protected]] Em nome de Luca Deri
Enviada em: terça-feira, 31 de agosto de 2010 06:03
Para: [email protected]
Assunto: Re: [Ntop-dev] RES: RES: Ntop processing only one packet
Rafael
thanks for debugging the code. The software works as follows:
- are we the only one processing packets? If so (i.e. no other threads
are doing this) then process the packet immediately. This turns into
if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex,
"queuePacket") == 0) {
/* Locked so we can process the packet now */
.....
processPacket(_deviceId, h, p1);
releaseMutex(&myGlobals.device[deviceId].packetProcessMutex);
return;
}
- if another thread is processing packets already, we need to queue the
packet
/*
If we reach this point it means that somebody was already
processing a packet so we need to queue it.
*/
if(myGlobals.device[deviceId].packetQueueLen >=
CONST_PACKET_QUEUE_LENGTH) {
...
}
In this second case ntop notifies the dequeue thread that there's a
packet to process
signalCondvar(&myGlobals.device[deviceId].queueCondvar);
Now my question is: are you sure that for some reason the dequeue thread
isn't looping or isn't really awake? Can you please check what happens
in (pbuf.c)
void* dequeuePacket(void* _deviceId) {
}
Just enable the traces (around #ifdef DEBUG) to see what happens there.
Cheers Luca
On 08/30/2010 11:26 PM, Rafael Sarres de Almeida wrote:
> Hi Luca;
> Just to add more info to my previous mail:
> I gdb the code in the first packet, seems like that the releaseMutex
> (2538,pbuf.c) is not releasing. I followed the code, it calls the
> realeaseMutex function after it processes the first packet, but on the next
> loop, the tryLockMutex (2510,pbuf.c) fails, so the program thinks the mutex
> is not released. Here is the debug:
>
>
> Breakpoint 4, queuePacket (_deviceId=<value optimized out>, h=0x45007050,
> p=0x2aaaad590042 "") at pbuf.c:2510
> 2510 if(tryLockMutex(&myGlobals.device[deviceId].packetProcessMutex,
> "queuePacket") == 0) {
>
> *********** It is going to process first packet if Mutex is not locked.
>
>
> (gdb) step
> Mon Aug 30 18:09:35 2010 THREADMGMT[t1094719808]: SIH: Idle host scan thread
> running [p12154]
> _tryLockMutex (mutexId=0x2aaaab7e1150, where=0x2aaaaad8d59d "queuePacket",
> fileName=0x2aaaaad8d41a "pbuf.c", fileLine=2510)
> at util.c:2078
> 2078 return(pthread_rwlock_trywrlock(&mutexId->mutex));
> (gdb)
> [New Thread 0x45a08940 (LWP 12189)]
> Mon Aug 30 18:09:42 2010 THREADMGMT[t1168148800]: RRD: Started thread for
> throughput data collection
> Mon Aug 30 18:09:42 2010 THREADMGMT[t1147169088]: RRD: Data collection
> thread running [p12154]
> Mon Aug 30 18:09:42 2010 THREADMGMT[t1168148800]: RRD: Throughput data
> collection: Thread starting [p12154]
> Mon Aug 30 18:09:42 2010 THREADMGMT[t1168148800]: RRD: Throughput data
> collection: Thread running [p12154]
> 0x000000304fa0a760 in pthread_rwlock_trywrlock () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function pthread_rwlock_trywrlock,
> which has no line number information.
> queuePacket (_deviceId=<value optimized out>, h=0x45007050, p=0x2aaaad590042
> "") at pbuf.c:2514
> 2514 myGlobals.receivedPacketsProcessed++;
> (gdb) break 2538
> Breakpoint 5 at 0x2aaaaad595a7: file pbuf.c, line 2538.
> (gdb) continue
> Continuing.
>
> Breakpoint 5, queuePacket (_deviceId=<value optimized out>, h=0x45007050,
> p=0x2aaaad590042 "") at pbuf.c:2538
> 2538 releaseMutex(&myGlobals.device[deviceId].packetProcessMutex);
>
> ***************Releasing MUTEX
>
> (gdb) step
> _releaseMutex (mutexId=0x2aaaab7e1150, fileName=0x2aaaaad8d41a "pbuf.c",
> fileLine=2538) at util.c:2156
> 2156 return(pthread_rwlock_unlock(&mutexId->mutex));
> (gdb)
> 0x000000304eedfa10 in pthread_mutex_unlock () from /lib64/libc.so.6
> (gdb)
> Single stepping until exit from function pthread_mutex_unlock,
> which has no line number information.
> 0x000000304fa0a020 in pthread_mutex_unlock () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function pthread_mutex_unlock,
> which has no line number information.
> 0x000000304fa0a0d8 in _L_unlock_766 () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function _L_unlock_766,
> which has no line number information.
> 0x000000304fa0d5e0 in __lll_unlock_wake () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function __lll_unlock_wake,
> which has no line number information.
> 0x000000304fa0a0e7 in _L_unlock_766 () from /lib64/libpthread.so.0
> (gdb)
> Single stepping until exit from function _L_unlock_766,
> which has no line number information.
> 0x000000304fa0a04e in pthread_mutex_unlock () from /lib64/libpthread.so.0
> (gdb)
>
>
> Any ideas?
>
> Rafael Sarres de Almeida
>
>
_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev
_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev