Yeah, that narrows the window, but it's still there.

The only real answer is to put a mutex protection around the reference here
and in the purge.  I'm just concerned about timing issues - during purge,
the mutex will be held for quite a while - probably have to use the same
type of periodic loop as in freeHostSessions().

-----Burton

-----Original Message-----
From: Dominique Lalot [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 27, 2003 11:56 AM
To: Burton M. Strauss III
Subject: RE: [Ntop] PR_MHVS2W ntop stops after a while


My switch is buggy-> reboot it and see all the traffic!.

Just an idea, as I also patched that code:
You test if sport is initialized
  if Yes init value
You test if dport is initialized
 if Yes init value
fill value for sport
fill value for dport

I suggest as there is some race conditions too shorten the delay, as malloc
for dport could be long.
I noticed too that it never happens for dport..
a Tracevent with sterror(errno) should be better after malloc problems. Less
work for support as
people will understand a memory problem.

You test if sport is initialized
  if Yes init value
fill value for sport
You test if dport is initialized
 if Yes init value
fill value for dport

Regards

Dom

A 06:52 27/05/2003 -0500, vous avez �crit :

The 'beauty' of the patch, is that since it's testing and issuing a warning
message, it's not using the NULL value to add to, so it's not segfaulting.

You shouldn't see ANY of your TEMP5/TEMP6 messages.  Look over the code...

580  if(myGlobals.device[actualDeviceId].ipPorts[sport] == NULL) {
581    traceEvent(CONST_TRACE_INFO, "TEMP: myGlobals.device[%d].ipPorts[%d]
NULL", actualDeviceId, sport);
582    myGlobals.device[actualDeviceId].ipPorts[sport] =
(PortCounter*)malloc(sizeof(PortCounter));
583    if(myGlobals.device[actualDeviceId].ipPorts[sport] == NULL) {
584        traceEvent(CONST_TRACE_INFO, "TEMP: myGlobals.device[].ipPorts[]
malloc() fail");
585    }

THERE SHOULD BE AND ELSE SO..


586    myGlobals.device[actualDeviceId].ipPorts[sport] =
(PortCounter*)malloc(sizeof(PortCounter));

586 is non sense. One malloc is enough?.


587    myGlobals.device[actualDeviceId].ipPorts[sport]->port = sport;
588    myGlobals.device[actualDeviceId].ipPorts[sport]->sent = 0;
589    myGlobals.device[actualDeviceId].ipPorts[sport]->rcvd = 0;
590  }
<snip />
603  if(myGlobals.device[actualDeviceId].ipPorts[sport] == NULL) {
604    traceEvent(CONST_TRACE_INFO, "TEMP: myGlobals.device[%d].ipPorts[%d]
NULL", actualDeviceId, sport);
605  } else
606  myGlobals.device[actualDeviceId].ipPorts[sport]->sent += length;

If it's NULL @ 580, it allocates the PortCounter structure at 582, and we
test (at 583) that the malloc() worked.  No message, no segfault
initializing the data at 586ff.

Yet it's null @ 603

It has to be that the purge, static void purgeIpPorts(int theDevice) {} in
main.c, is being fired off and overlapping the increment.  Adding a mutex()
there is going to be ugly performance-wise, but I can't help wondering WHY
you're getting bit so reliably.  Unless - do you have something like
WhatsUpGold probing your systems every 5 minutes?  Just infrequently enough
for ntop to think it's inactive and just often enough to hit this rare
condition??

-----Burton







-----Original Message-----
From: Dominique Lalot [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 27, 2003 3:25 AM
To: Burton M. Strauss III
Subject: RE: [Ntop] PR_MHVS2W ntop stops after a while


A 17:32 24/05/2003 -0500, vous avez �crit :

Argh... Looking at the patch, I used the same text for both messages, so you
can't tell them apart except for the MSGID.

581 & 593 are perfectly normal.  That's where it creates the counters if
they don't already exist.  The ones of interest are 604 & 608, the tests at
the point of the add.  What I want to see is whether it's not null (so it's
not created) and then yet, some how IS null at the time it's being added.

-----Burton

Burton,

what is strange, is that applying your patch, stats for Data Sent looks very
little, and there is only a few errors!..
ulimit is unlimited

free
             total       used       free     shared    buffers     cached
Mem:       1030196     789556     240640          0     155220     408864
-/+ buffers/cache:     225472     804724
Swap:      2040212       1008    2039204

top
 2854 nobody    22   0 73712  71M  2016 S    99.9  7.1   0:00   2 ntop
 2861 nobody    15   0 73712  71M  2016 S     0.0  7.1   0:00   0 ntop
 2862 nobody    24   0 73712  71M  2016 S     0.0  7.1   0:00   2 ntop
 2863 nobody    15   0 73712  71M  2016 S     0.0  7.1   1:26   2 ntop
 2864 nobody    15   0 73712  71M  2016 S     0.0  7.1   0:13   3 ntop
 2865 nobody    16   0 73712  71M  2016 S     0.4  7.1   0:21   3 ntop
 2866 nobody    15   0 73712  71M  2016 S     2.4  7.1  20:49   2 ntop


[EMAIL PROTECTED] root]# grep TEMP /var/log/messages
May 26 19:10:17 ad01u253 ntop[2866]: [MSGID00608-pbuf] TEMP6:
myGlobals.device[0].ipPorts[1776] NULL
May 27 05:54:18 ad01u253 ntop[2866]: [MSGID00604-pbuf] TEMP5:
myGlobals.device[0].ipPorts[4664] NULL
May 27 05:54:18 ad01u253 ntop[2866]: [MSGID00608-pbuf] TEMP6:
myGlobals.device[0].ipPorts[4503] NULL
May 27 08:46:35 ad01u253 ntop[2866]: [MSGID00604-pbuf] TEMP5:
myGlobals.device[0].ipPorts[16683] NULL
May 27 08:46:35 ad01u253 ntop[2866]: [MSGID00608-pbuf] TEMP6:
myGlobals.device[0].ipPorts[6887] NULL


  if(myGlobals.device[actualDeviceId].ipPorts[sport] == NULL) {
    traceEvent(CONST_TRACE_INFO, "TEMP5: myGlobals.device[%d].ipPorts[%d]
NULL", actualDeviceId, sport);
  } else
  myGlobals.device[actualDeviceId].ipPorts[sport]->sent += length;
  if(myGlobals.device[actualDeviceId].ipPorts[sport] == NULL) {
    traceEvent(CONST_TRACE_INFO, "TEMP6: myGlobals.device[%d].ipPorts[%d]
NULL", actualDeviceId, dport);
  } else
  myGlobals.device[actualDeviceId].ipPorts[dport]->rcvd += length;




-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, May 23, 2003 11:50 AM
To: Burton M. Strauss III
Subject: Re: [Ntop] PR_MHVS2W ntop stops after a while


I applied, then stopped ntop, as there's to many prints:

23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[3329] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[37179] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[37209] NULL
23/May/2003 18:22:19 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[1235] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[1222] NULL
23/May/2003 18:22:19 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[4665] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[1708] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[1053] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[37135] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[3290] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[2354] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[1620] NULL
23/May/2003 18:22:19 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[56721] NULL
23/May/2003 18:22:19 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[2978] NULL
23/May/2003 18:22:20 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[1709] NULL
23/May/2003 18:22:20 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[50010] NULL
23/May/2003 18:22:20 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[2355] NULL
23/May/2003 18:22:20 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[8050] NULL
23/May/2003 18:22:20 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[4655] NULL
23/May/2003 18:22:20 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[4537] NULL
23/May/2003 18:22:20 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[4656] NULL
23/May/2003 18:22:20 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[2356] NULL
23/May/2003 18:22:20 [MSGID00593-pbuf] TEMP:
myGlobals.device[0].ipPorts[2358] NULL
23/May/2003 18:22:20 [MSGID00581-pbuf] TEMP:
myGlobals.device[0].ipPorts[1425] NULL

I       just add some modifications, a number for TEMP in the order of your
calls:

23/May/2003 18:38:34 [MSGID00581-pbuf] TEMP1:
myGlobals.device[0].ipPorts[39192] NULL
23/May/2003 18:38:34 [MSGID00581-pbuf] TEMP1:
myGlobals.device[0].ipPorts[4177] NULL
23/May/2003 18:38:34 [MSGID00593-pbuf] TEMP3:
myGlobals.device[0].ipPorts[36514] NULL
23/May/2003 18:38:34 [MSGID00581-pbuf] TEMP1:
myGlobals.device[0].ipPorts[34019] NULL
23/May/2003 18:38:34 [MSGID00593-pbuf] TEMP3:
myGlobals.device[0].ipPorts[21027] NULL
23/May/2003 18:38:34 [MSGID00593-pbuf] TEMP3:
myGlobals.device[0].ipPorts[7215] NULL
23/May/2003 18:38:34 [MSGID00581-pbuf] TEMP1:
myGlobals.device[0].ipPorts[24077] NULL
23/May/2003 18:38:34 [MSGID00581-pbuf] TEMP1:
myGlobals.device[0].ipPorts[3418] NULL

Then comment 1 and 3 as the worst is comming from 2 and 4 malloc returning
NULL
I hope it's not a linux threading problem.

I got strange behavior in mod_perl which is threaded under apache
main modify a global
call a sub
the sub has not the same address for the global
return
main is still playing with the right address

Obliged to work with a new parameter for the proc..
as I see myGlobals ..

But you're much more experienced than me on such subjects.

I'm leaving now ntop under gdb all the week-end

Bye

Dominique

On Thu, May 22, 2003 at 11:28:51AM -0500, Burton M. Strauss III wrote:
> That's the same problem that's been reported by others, except according
to
> the code, it can't happen...
>
>   if(myGlobals.device[actualDeviceId].ipPorts[sport] == NULL) {
>     myGlobals.device[actualDeviceId].ipPorts[sport] =
> (PortCounter*)malloc(sizeof(PortCounter));
>     myGlobals.device[actualDeviceId].ipPorts[sport]->port = sport;
>     myGlobals.device[actualDeviceId].ipPorts[sport]->sent = 0;
>     myGlobals.device[actualDeviceId].ipPorts[sport]->rcvd = 0;
>   }
>
>   if(myGlobals.device[actualDeviceId].ipPorts[dport] == NULL) {
>     myGlobals.device[actualDeviceId].ipPorts[dport] =
> (PortCounter*)malloc(sizeof(PortCounter));
>     myGlobals.device[actualDeviceId].ipPorts[dport]->port = dport;
>     myGlobals.device[actualDeviceId].ipPorts[dport]->sent = 0;
>     myGlobals.device[actualDeviceId].ipPorts[dport]->rcvd = 0;
>   }
>
> > myGlobals.device[actualDeviceId].ipPorts[sport]->sent += length;
>   myGlobals.device[actualDeviceId].ipPorts[dport]->rcvd += length;
>
> If ipPorts doesn't exist yet, it's created just above.  And we've had
others
> test the malloc() returns.
>
> Since you've only got a single interface, thread problems can't be it.
>
> Tell you what, since you seem able to recreate it almost at will...
>
> Apply the attached patch to make it tell us when it's allocating the
> PortCounter and again to test just before the add.
>
> grep out the TEMP lines and send them.  Note that the patch should skip
the
> add if it's really null, so it won't bomb.
>
> If it DOES bomb, then it's getting corrupted (vs. null) and I'll want to
see
> the values for myGlobals.device[0].ipPorts[x] where x are the dport and
> sport values.  To print local variables, say from processPacket, you need
to
> use
> (gdb) frame 1
> first.
>
> -----Burton
>
> -----Original Message-----
> From: Dominique Lalot [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 22, 2003 12:54 AM
> To: Burton M. Strauss III
> Subject: Re: [Ntop] PR_MHVS2W ntop stops after a while
>
>
> Burton,
> I was lucky enough to get the failure with the last snapshot 2.2.1
> seems to be the same problem.
> Using last snapshot from yesterday ( ntop-03-05-21 ) I was able to trace
> using gdb as written in FAQ:
> Is it usable for you?.
>
> Thanks
>
> Beware that the configuration is almost the same, except that -C is no
more
> working.
>
> Dominique
>
>
>
> last snapshot
> (gdb) set args -u root @manaix.ntop -L -K
> (gdb) run
> Starting program: /usr/bin/.libs/lt-ntop -u root @manaix.ntop -L -K
> [New Thread 16384 (LWP 30646)]
> Wait please: ntop is coming up...
> Processing file manaix.ntop for parameters...
> [New Thread 32769 (LWP 30653)]
> [New Thread 16386 (LWP 30654)]
> [New Thread 32771 (LWP 30655)]
> [New Thread 49156 (LWP 30656)]
> [New Thread 65541 (LWP 30657)]
> [New Thread 81926 (LWP 30658)]
>
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 81926 (LWP 30658)]
> 0x4009b082 in updateInterfacePorts (actualDeviceId=0, sport=320, dport=80,
> length=62) at pbuf.c:593
> 593 myGlobals.device[actualDeviceId].ipPorts[sport]->sent += length;
> list
> (gdb) list
> 588 myGlobals.device[actualDeviceId].ipPorts[dport]->port = dport;
> 589 myGlobals.device[actualDeviceId].ipPorts[dport]->sent = 0;
> 590 myGlobals.device[actualDeviceId].ipPorts[dport]->rcvd = 0;
> 591 }
> 592
> 593 myGlobals.device[actualDeviceId].ipPorts[sport]->sent += length;
> 594 myGlobals.device[actualDeviceId].ipPorts[dport]->rcvd += length;
> 595
> 596 #ifdef CFG_MULTITHREADED
> 597 releaseMutex(&myGlobals.gdbmMutex);
> (gdb) bt
> #0 0x4009b082 in updateInterfacePorts (actualDeviceId=0, sport=320,
> dport=80, length=62) at pbuf.c:593
> No locals.
> #1 0x4009d001 in processIpPkt (bp=0x8087928 "E", h=0xbefff9cc, length=62,
> ether_src=0xbefff962 "",
> ether_dst=0xbefff95c "", actualDeviceId=0, vlanId=-1) at pbuf.c:990
> sport = 2247
> dport = 80
> ip = {ip_hl = 5, ip_v = 4, ip_tos = 0 '\0', ip_len = 12288, ip_id = 12935,
> ip_off = 64, ip_ttl = 125 '}',
> ip_p = 6 '\006', ip_sum = 28095, ip_src = {s_addr = 2472469770}, ip_dst =
> {s_addr = 3232267798}}
> tp = {th_sport = 50952, th_dport = 20480, th_seq = 1324592302, th_ack = 0,
> th_x2 = 0 '\0',
> th_off = 7 '\a', th_flags = 2 '\002', th_win = 64, th_sum = 58592, th_urp
=
> 0}
> up = {uh_sport = 13568, uh_dport = 37622, uh_ulen = 51712, uh_sum = 62118}
> icmpPkt = {icmp_type = 0 '\0', icmp_code = 0 '\0', icmp_cksum = 0,
icmp_hun
> = {ih_pptr = 0 '\0',
> ih_gwaddr = {s_addr = 0}, ih_idseq = {icd_id = 0, icd_seq = 0}, ih_void =
0,
> ih_pmtu = {ipm_void = 0,
> ipm_nextmtu = 0}, ih_rtradv = {irt_num_addrs = 0 '\0', irt_wpa = 0 '\0',
> irt_lifetime = 0}}, icmp_dun = {
> id_ts = {its_otime = 0, its_rtime = 1077415822, its_ttime = 1570113147},
> id_ip = {idi_ip = {ip_hl = 0,
> ip_v = 0, ip_tos = 0 '\0', ip_len = 0, ip_id = 3982, ip_off = 16440,
ip_ttl
> = 123 '{', ip_p = 6 '\006',
> ip_sum = 23958, ip_src = {s_addr = 182804115}, ip_dst = {s_addr =
> 749316288}}}, id_radv = {ira_addr = 0,
> ira_preference = 1077415822}, id_mask = 0, id_data = ""}}
> hlen = 20
> tcpDataLength = 0
> udpDataLength = 194
> off = 16384
> tcpUdpLen = 28
> srcHostIdx = 1
> dstHostIdx = 1
> srcHost = (struct hostTraffic *) 0x8085ec8
> dstHost = (struct hostTraffic *) 0x8085ec8
> forceUsingIPaddress = 0 '\0'
> tvstrct = {tv_sec = 0, tv_usec = 0}
> theData = (
> u_char *) 0x8087958
>
"[EMAIL PROTECTED])\203\016|+\203\016'��z�9��\200R-\037\037�\034\0
> 37K��U_�$v�\203��X��Q�g\2116�u_\206]Et|\221�"
> ctr = {value = 62, modified = 68 'D'}
> #2 0x4009fc59 in processPacket (_deviceId=0x0, h=0xbefff9cc, p=0x808791a
"")
> at pbuf.c:2496
> pppoe_hdr = (struct pppoe_hdr *) 0x3e
> protocol = 0
> ehdr = {ether_dhost = "\000\t��P\200", ether_shost = "\000\0010\035�`",
> ether_type = 8}
> trp = (struct tokenRing_header *) 0x0
> fddip = (struct fddi_header *) 0x400ba380
> hlen = 14
> caplen = 62
> headerDisplacement = 0
> length = 62
> orig_p = (const u_char *) 0x808791a ""
> p1 = (const u_char *) 0x0
> ether_src = (u_char *) 0xbefff962 ""
> ether_dst = (u_char *) 0xbefff95c ""
> eth_type = 2048
> trllc = (struct tokenRing_llc *) 0x3e
> fd = (struct _IO_FILE *) 0x0
> ipxBuffer =
>
"[EMAIL PROTECTED]<��\001\000\000\0
> 00\000\000\000\000\000\230;[EMAIL PROTECTED]@�&[EMAIL PROTECTED]@@\025\000\000\000 
> �s@
>
[EMAIL PROTECTED]@@[EMAIL PROTECTED]@@[EMAIL PROTECTED]

>
\000�([EMAIL PROTECTED]>
> \006 [EMAIL PROTECTED]@5\000\000\000\n4Q@"
> actualDeviceId = 0
> vlanId = -1
> #3 0x405150ca in pcap_read () from /usr/lib/libpcap.so.0.6.2
> No symbol table info available.
> #4 0x4051663b in pcap_dispatch () from /usr/lib/libpcap.so.0.6.2
> No symbol table info available.
> #5 0x40097c69 in pcapDispatch (_i=0x0) at ntop.c:109
> rc = 0
> pcap_fd = 4
> readMask = {__fds_bits = {16, 0 <repeats 31 times>}}
> #6 0x4037cc0e in pthread_start_thread_event () from /lib/libpthread.so.0
> No symbol table info available.
> (gdb) print deviceId
> No symbol "deviceId" in current context.
> (gdb)
>
>

_______________________________________________
Ntop mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop

_______________________________________________
Ntop mailing list
[EMAIL PROTECTED]
http://listgateway.unipi.it/mailman/listinfo/ntop

Reply via email to