The idea of something concurrent affecting the variables makes sense.  I
ran it again with the trace level set to 5.  It crashed calling the same
function with a NULL pointer from a different place, right after a line
where the same dereference succeeded.

There isn't anything about the idle purge in the trace output.  I've
included the end of the output below but I did search for the words idle
and purge in the rest of the log.


gdb run on core dump:

(gdb) bt
#0  0xb5bf4857 in memset () from /lib/tls/libc.so.6
#1  0x89a5c648 in ?? ()
#2  0xb5f839b1 in resetUsageCounter (counter=0x170c) at util.c:3690
#3  0xb5f83ca1 in resetSecurityHostTraffic (el=0x89a5c648) at util.c:3747
#4  0xb5f65802 in allocateSecurityHostPkts (srcHost=0x89a5c648) at
pbuf.c:123
#5  0xb23e8d68 in handleGenericFlow (recordActTime=-1862720189,
recordSysUpTime=-1775288765,
    record=0xadd2aab0, deviceId=7) at netflowPlugin.c:612
#6  0xb23e9c62 in dissectFlow (buffer=0xadd2b200 "", bufferLen=1464,
deviceId=7) at netflowPlugin.c:1276
#7  0xb23eab7f in netflowMainLoop (_deviceId=0x7) at netflowPlugin.c:1469
#8  0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0
#9  0xb5c569ea in clone () from /lib/tls/libc.so.6
(gdb) frame 3
#3  0xb5f83ca1 in resetSecurityHostTraffic (el=0x89a5c648) at util.c:3747
3747      resetUsageCounter(&el->secHostPkts->tcpToDiagnosticPortRcvd);
(gdb) list
3742      resetUsageCounter(&el->secHostPkts->ackXmasFinSynNullScanRcvd);
3743      resetUsageCounter(&el->secHostPkts->rejectedTCPConnRcvd);
3744      resetUsageCounter(&el->secHostPkts->establishedTCPConnRcvd);
3745      resetUsageCounter(&el->secHostPkts->udpToClosedPortRcvd);
3746      resetUsageCounter(&el->secHostPkts->udpToDiagnosticPortRcvd);
3747      resetUsageCounter(&el->secHostPkts->tcpToDiagnosticPortRcvd);
3748      resetUsageCounter(&el->secHostPkts->tinyFragmentRcvd);
3749      resetUsageCounter(&el->secHostPkts->icmpFragmentRcvd);
3750      resetUsageCounter(&el->secHostPkts->overlappingFragmentRcvd);
3751      resetUsageCounter(&el->secHostPkts->closedEmptyTCPConnRcvd);
(gdb) print el->secHostPkts
$1 = (SecurityHostProbes *) 0x0
(gdb) info thread
  26 process 20452  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  25 process 20454  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  24 process 20455  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  23 process 20456  0xb5bebf50 in free () from /lib/tls/libc.so.6
  22 process 20457  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  21 process 20459  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  20 process 20460  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  19 process 20461  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  18 process 20462  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  17 process 20463  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  16 process 20464  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  15 process 20465  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  14 process 20467  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  13 process 20468  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  12 process 20469  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  11 process 20470  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  10 process 20471  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  9 process 20472  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  8 process 20473  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  7 process 20474  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  6 process 20475  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  5 process 20476  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  4 process 20477  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  3 process 20478  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  2 process 20479  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
* 1 process 20466  0xb5bf4857 in memset () from /lib/tls/libc.so.6


End of trace output:

Mon Sep 12 16:12:18 2005  THREADMGMT[t2875988912]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2847357872]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2817854384]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t3033750448]: SFP: Fingerprint scan
thread running [p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2903215024]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2978061232]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2966432688]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2929036208]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t3023260592]: SIH: Idle host scan
thread running [p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2802809776]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2771864496]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2861816752]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:18 2005  THREADMGMT[t2787478448]: NETFLOW: thread running
[p20452]
Mon Sep 12 16:12:28 2005  THREADMGMT[t2736782256]: RRD: Throughput data
collection: Thread starting [p20452]
Mon Sep 12 16:12:28 2005  THREADMGMT[t2736782256]: RRD: Throughput data
collection: Thread running [p20452]
Mon Sep 12 16:12:28 2005  THREADMGMT[t2736782256]: RRD: Started thread for
throughput data collection
Mon Sep 12 16:12:28 2005  THREADMGMT[t2761374640]: RRD: Data collection
thread running [p20452]


Syslog entries:
Sep 12 16:01:41 localhost kernel: device eth0 entered promiscuous mode
Sep 12 16:01:41 localhost kernel: application bug: ntop(20313) has SIGCHLD
set to SIG_IGN but calls wait().
Sep 12 16:01:41 localhost kernel: (see the NOTES section of 'man 2 wait').
Workaround activated.
Sep 12 16:01:47 localhost kernel: device eth0 left promiscuous mode
Sep 12 16:06:12 localhost ntop[20417]:   THREADMGMT[t3045949568]: ntop
RUNSTATE: PREINIT(1)
Sep 12 16:06:12 localhost ntop[20417]:   THREADMGMT[t3045949568]: ntop
RUNSTATE: INIT(2)
Sep 12 16:06:12 localhost kernel: device eth0 entered promiscuous mode
Sep 12 16:06:13 localhost kernel: application bug: ntop(20417) has SIGCHLD
set to SIG_IGN but calls wait().
Sep 12 16:06:13 localhost kernel: (see the NOTES section of 'man 2 wait').
Workaround activated.
Sep 12 16:09:14 localhost kernel: device eth0 left promiscuous mode
Sep 12 16:12:17 localhost ntop[20452]:   THREADMGMT[t3045949568]: ntop
RUNSTATE: PREINIT(1)
Sep 12 16:12:17 localhost ntop[20452]:   THREADMGMT[t3045949568]: ntop
RUNSTATE: INIT(2)
Sep 12 16:12:17 localhost kernel: device eth0 entered promiscuous mode
Sep 12 16:12:17 localhost kernel: application bug: ntop(20452) has SIGCHLD
set to SIG_IGN but calls wait().
Sep 12 16:12:17 localhost kernel: (see the NOTES section of 'man 2 wait').
Workaround activated.
Sep 12 16:59:42 localhost kernel: device eth0 left promiscuous mode



                                                                           
             "Burton Strauss"                                              
             <[EMAIL PROTECTED]                                             
             rt.com>                                                    To 
             Sent by:                  <[email protected]>                 
             ntop-dev-bounces@                                          cc 
             unipi.it                                                      
                                                                   Subject 
                                       RE: [Ntop-dev] Core dump in 3.2rc1  
             09/12/2005 04:54          netflow handling                    
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             [email protected]                                             
                                                                           
                                                                           




_dl_sysinfo_int80 usually means a deadlock in malloc() or similar call
(it's
the user-kernel interface call).

SIGSEGV in the call to malloc() in leaks.c, line 75 is also consistent w/ a
malloc() chain corruption.

Problem is these are all symptoms - the actual problem could be many
seconds
or minutes before.

"The NULL pointer looks like trouble.":  Right - here's the code block:

void resetSecurityHostTraffic(HostTraffic *el) {
  if(el->secHostPkts == NULL) return;
  resetUsageCounter(&el->secHostPkts->synPktsSent);
...

What this appears to mean is that it was NOT null at the start of this
block
of code and yet is now is.

So... Pls check in the log and see if idle purge was kicking around (you
may
need to go to a higher trace level to see the IDLE_PURGE messages)

-----Burton

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf
Of [EMAIL PROTECTED]
Sent: Monday, September 12, 2005 3:25 PM
To: [email protected]
Subject: [Ntop-dev] Core dump in 3.2rc1 netflow handling

I'm having trouble with ntop dying.  I ran it through gdb and come up with
the information listed below.  I ran it once from within gdb and once from
outside, then pulling the core into gdb.  I'm running it on RedHat AS 3 and
built it myself.  Ntop crashes pretty easily so if there is something I
should try let me know and I can try it.


Run inside gdb:


Error type: Segmentation fault

(gdb) info thread
......
  9 Thread -1328637008 (LWP 19784)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  8 Thread -1315648592 (LWP 19783)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
* 7 Thread -1304302672 (LWP 19782)  0xb5becaaa in _int_malloc () from
/lib/tls/libc.so.6
  6 Thread -1292686416 (LWP 19781)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
  5 Thread -1282196560 (LWP 19780)  0xb75ebc32 in _dl_sysinfo_int80 () from
/lib/ld-linux.so.2
....

(gdb) thread 7
[Switching to thread 7 (Thread -1304302672 (LWP 19782))]#2  0xb5f62062 in
ntop_safemalloc (sz=1,
    file=0xb5f8d5c5 "hash.c", line=954) at leaks.c:75
75        thePtr = malloc(sz);

(gdb) list
70                               */
71        }
72      #endif
73
74      #ifndef USE_GC
75        thePtr = malloc(sz);
76      #else
77        thePtr = GC_malloc_atomic(sz);
78      #endif
79

(gdb) print sz
$4 = 1

(gdb) bt
#0  0xb5becaaa in _int_malloc () from /lib/tls/libc.so.6
#1  0xb5bebdfd in malloc () from /lib/tls/libc.so.6
#2  0xb5f62062 in ntop_safemalloc (sz=1, file=0xb5f8d5c5 "hash.c",
line=954) at leaks.c:75
#3  0xb5f5b4dc in _lookupHost (hostIpAddress=0xb241cd10, ether_addr=0x0,
vlanId=0,
    checkForMultihoming=0 '\0', forceUsingIPaddress=1 '\001',
actualDeviceId=1,
    file=0xb250b639 "netflowPlugin.c", line=497) at hash.c:954
#4  0xb25051ad in handleGenericFlow (recordActTime=869082435,
recordSysUpTime=-1783133275,
    record=0xb241dab0, deviceId=1) at netflowPlugin.c:497
#5  0xb2505c62 in dissectFlow (buffer=0xb241e200 "", bufferLen=1464,
deviceId=1) at netflowPlugin.c:1276
#6  0xb2506b7f in netflowMainLoop (_deviceId=0x1) at netflowPlugin.c:1469
#7  0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0
#8  0xb5c569ea in clone () from /lib/tls/libc.so.6


Run from the command line:

Mon Sep 12 14:21:15 2005  THREADMGMT[t2733636528]: RRD: Throughput data
collection: Thread running [p19952]
Mon Sep 12 14:21:15 2005  THREADMGMT[t2733636528]: RRD: Started thread for
throughput data collection Mon Sep 12 14:21:15 2005
THREADMGMT[t2762632112]: RRD: Data collection thread running [p19952]
Segmentation fault (core dumped) # ls -lt | head total 306628
-rw-------    1 root     root     565616640 Sep 12 14:30 core.19952
....

# gdb /home/hc05/ntop-mine/bin/ntop core.19952 GNU gdb Red Hat Linux
(5.3.90-0.20030710.40rh)

(gdb) bt
#0  0xb5bf4857 in memset () from /lib/tls/libc.so.6
#1  0x104eec48 in ?? ()
#2  0xb5f839b1 in resetUsageCounter (counter=0x49c) at util.c:3690
#3  0xb5f83a6c in resetSecurityHostTraffic (el=0x104eec48) at util.c:3713
#4  0xb5f65802 in allocateSecurityHostPkts (srcHost=0x104eec48) at
pbuf.c:123
#5  0xb2504d5c in handleGenericFlow (recordActTime=685188419,
recordSysUpTime=-1795279914,
    record=0xaea8aab0, deviceId=6) at netflowPlugin.c:612
#6  0xb2505c62 in dissectFlow (buffer=0xaea8b200 "", bufferLen=1464,
deviceId=6) at netflowPlugin.c:1276
#7  0xb2506b7f in netflowMainLoop (_deviceId=0x6) at netflowPlugin.c:1469
#8  0xb5d0cdac in start_thread () from /lib/tls/libpthread.so.0
#9  0xb5c569ea in clone () from /lib/tls/libc.so.6


(gdb) frame 2
#2  0xb5f839b1 in resetUsageCounter (counter=0x49c) at util.c:3690
3690      memset(counter, 0, sizeof(UsageCounter));
(gdb) list
3685    /* ******************************* */
3686
3687    void resetUsageCounter(UsageCounter *counter) {
3688      int i;
3689
3690      memset(counter, 0, sizeof(UsageCounter));
3691
3692      for(i=0; i<MAX_NUM_CONTACTED_PEERS; i++)
3693        setEmptySerial(&counter->peersSerials[i]);
3694    }
(gdb) print counter
$1 = (UsageCounter *) 0x49c

(gdb) frame 3
#3  0xb5f83a6c in resetSecurityHostTraffic (el=0x104eec48) at util.c:3713
3713      resetUsageCounter(&el->secHostPkts->nullPktsSent);
(gdb) print el
$2 = (HostTraffic *) 0x104eec48
(gdb) print el->secHostPkts->nullPktsSent Cannot access memory at address
0x49c
(gdb) print el->secHostPkts
$3 = (SecurityHostProbes *) 0x0
(gdb)

The NULL pointer looks like trouble.


(gdb) frame 4
#4  0xb5f65802 in allocateSecurityHostPkts (srcHost=0x104eec48) at
pbuf.c:123
123         resetSecurityHostTraffic(srcHost);
(gdb) list
118     /* ******************************* */
119
120     void allocateSecurityHostPkts(HostTraffic *srcHost) {
121       if(srcHost->secHostPkts == NULL) {
122         if((srcHost->secHostPkts =
(SecurityHostProbes*)malloc(sizeof(SecurityHostProbes))) == NULL) return;
123         resetSecurityHostTraffic(srcHost);
124       }
125     }
126
127     /* ************************************ */
(gdb) print srcHost->secHostPkts
$4 = (SecurityHostProbes *) 0x0

I don't know why the test on line 122 didn't catch the NULL.  Also, I don't
know why the malloc failed unless it is because the core usage is really
big.

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev


_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to