Title: Re: [Ntop-dev] A lot of "**WARNING** Error: bad magic number[expected=1968/real=0][deviceId=0]" messages...
Hi Burton,

A version with your latest patches was started Friday morning. It ran OK until 3:44 AM this morning and crashed with no error message whatsoever in the logs, like it did twice last week (before updating to the new CVS version):

# ps -axf | grep ntop
 1824 pts/1    S      0:00  |                   \_ grep ntop
16185 ?        S      0:00 /usr/local/ntop3cvs/bin/ntop -i eth1 @/etc/ntop3cvs.conf
16186 ?        S      0:00 /usr/local/ntop3cvs/bin/ntop -i eth1 @/etc/ntop3cvs.conf
# [EMAIL PROTECTED] ntop]# gdb /usr/local/ntop3cvs/bin/ntop 16185
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
Copyright 2003 Free Software Foundation, Inc.
... <snip>
Loaded symbols for /lib/libnss_dns.so.2
0xffffe002 in ?? ()
(gdb) info stack
#0  0xffffe002 in ?? ()
#1  0x42095eee in localtime_r () from /lib/tls/libc.so.6
#2  0x400e9b3b in traceEvent (eventTraceLevel=1, file=0x40087b3d "http.c", line=1741,
    format=0x4008b080 "http generation failed, alarm() tripped. Please report this to ntop-dev list!") at util.c:2529
#3  0x400404fb in quitNow (signo=14) at http.c:1741
#4  <signal handler called>
#5  0xffffe002 in ?? ()
#6  0x42095eee in localtime_r () from /lib/tls/libc.so.6
#7  0x4003fc72 in sendHTTPHeader (mimeType=4, headerFlags=0, useCompressionIfAvailable=1) at http.c:1431
#8  0x40041a13 in returnHTTPPage (pageName=0x4497b479 "pktSizeDistribPie.png", postLen=-1, from=0x4497b698, httpRequestedAt=0x4497af20, usedFork=0x4497af14,
    agent=0x4497b338 "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; fr-fr) AppleWebKit/312.1 (KHTML, like Gecko) Safari/312",
    referer=0x4497b238 "http://gbp.sti.usherbrooke.ca:3000/", requestedLanguage=0x4497b128, numLang=0, isPostMethod=0) at http.c:2662
#9  0x40044977 in handleHTTPrequest (from=
              {hostFamily = 2, addr = {_hostIp4Address = {s_addr = 2228357892}, _hostIp6Address = {in6_u = {u6_addr8 = "[EMAIL PROTECTED]", u6_addr16 = {2820, 34002, 148, 0, 42368, 16385, 23632, 16896}, u6_addr32 = {2228357892, 148, 1073849728, 1107319888}}}}}) at http.c:3578
#10 0x4007e4be in handleSingleWebConnection (fdmask=0x4497b6ac) at webInterface.c:9038
#11 0x4007e26c in handleWebConnections (notUsed=0x0) at webInterface.c:8890
#12 0x4181a484 in start_thread () from /lib/tls/libpthread.so.0
(gdb) info threads
  1 Thread 1150794544 (LWP 16185)  0xffffe002 in ?? ()
(gdb)


Thread 1150794544  corresponds to a web server thread:

Aug 19 08:33:12 gbp ntop[14810]:  [MSGID9305063] THREADMGMT[t1150794544]: WEB: Server connection thread running [p14810]

but process 14810 which started it is dead. There’s another process, 16186. Doing an “info stack” in gdb of this one gives almost identical results as for process # 16185.

And here are the last lines of the log:

$ tail /var/log/ntop3cvs.log
Aug 22 03:39:49 gbp ntop[14810]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 1609 - checked 79, resolved 0
Aug 22 03:40:53 gbp ntop[14810]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1419 [out of 11137] hosts selected
Aug 22 03:40:53 gbp ntop[14810]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1419/11136 hosts deleted, elapsed time is 0.013701 seconds (0.000010 per host)
Aug 22 03:42:19 gbp ntop[14810]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 1610 - checked 81, resolved 0
Aug 22 03:42:53 gbp ntop[14810]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 2054 [out of 11265] hosts selected
Aug 22 03:42:53 gbp ntop[14810]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 2054/11264 hosts deleted, elapsed time is 0.020238 seconds (0.000010 per host)
Aug 22 03:43:39 gbp ntop[14810]:  [MSGID8931710] RRD: Cycle 807 ended, 59 RRDs updated
Aug 22 03:44:49 gbp ntop[14810]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 1611 - checked 81, resolved 0
Aug 22 03:44:53 gbp ntop[14810]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1330 [out of 10670] hosts selected
Aug 22 03:44:53 gbp ntop[14810]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1330/10669 hosts deleted, elapsed time is 0.012306 seconds (0.000009 per host)


And finally to answer your question, yes this is a SMP machine (single xeon, 2 virtual CPUs):

$ uname –a
Linux gbp.sti.usherbrooke.ca 2.4.20-31.9smp #1 SMP Tue Apr 13 17:40:10 EDT 2004 i686 i686 i386 GNU/Linux
$ cat /etc/redhat-release
Red Hat Linux release 9 (Shrike)
$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 7
cpu MHz         : 2392.283
cache size      : 512 KB
... <snip>

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 7
cpu MHz         : 2392.283
cache size      : 512 KB
... <snip>

I read what you wrote in globals-defines.h regarding the new MEMORY_DEBUG settings and I wasn’t sure if these settings would help for me (I didn’t use them).

I will restart ntop and hope that the “mad magic number” comes back before it crashes again...

Thank you for your help,

Marc.
--
Marc Mazuhelli
Computer security analyst
Service des technologies de l'information
Université de Sherbrooke



le 05/08/19 10:03, Burton Strauss à [EMAIL PROTECTED] a écrit :

Marc.

Try the CVS version.  The patch I put in yesterday should help narrow things down.  First off, ntop shouldn't crash, because it will no longer attempt to use the HostTraffic entry with the bad magic number.  That isn't affecting you, but it does affect others.  

Secondly, there is now an 'unmagic' number, 1290, which is set in HostTraffic during the start of the delete process.  Because of the timing of the messages (between selection and completion of the purge), I'm suspicious that we are reusing deleted entries and/or have a pointer to them during the delete process.  So if your messages become:

Aug 17 06:46:34 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=1290][deviceId=0] getFirstHost()[address.c/54]

That will tell us the problem is reusing deleted entries.  If it's still zero, then the problem is something else.

One other question - is this an SMP machine?  (Either two real processors, or dual cores or even HyperThreading)

-----Burton


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Mazuhelli
Sent: Wednesday, August 17, 2005 8:21 AM
To: [email protected]
Subject: Re: [Ntop-dev] A lot of "**WARNING** Error: bad magic number[expected=1968/real=0][deviceId=0]" messages...

Hi all (specially Burton!),

On 08/09/2005 Burton Strauss <[EMAIL PROTECTED]> wrote:

Next Steps:  Grab the cvs (wait for the 621 and 622 commit  messages) and rerun the test - when it goes bad, please do the gdb stuff  above, and also grab the updated log messages.  This should give us much  more info.

-----Burton

I grabbed the new CVS on the same day (including 621+622 commits), restarted ntop and it ran without incidents for a few days. It finally crashed without any error messages in the log, Only one thread was still running (and it was not the web server thread). I just restarted ntop and the same thing happened another time – I restarted again.

But this morning the famous “bad magic number” came back:

Aug 17 06:43:57 gbp ntop[23566]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 455 - checked 105, resolved 0
Aug 17 06:44:34 gbp ntop[23566]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 934 [out of 8303] hosts selected
Aug 17 06:44:34 gbp ntop[23566]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 934/8302 hosts deleted, elapsed time is 0.044033 seconds (0.000047 per host)
Aug 17 06:45:42 gbp ntop[23566]:  [MSGID8931710] RRD: Cycle 229 ended, 60 RRDs updated
Aug 17 06:46:27 gbp ntop[23566]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 456 - checked 104, resolved 0
Aug 17 06:46:34 gbp ntop[23566]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1201 [out of 8563] hosts selected
Aug 17 06:46:34 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=0][deviceId=0] getFirstHost()[address.c/54]
Aug 17 06:46:34 gbp last message repeated 11 times
Aug 17 06:46:34 gbp ntop[23566]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1201/8562 hosts deleted, elapsed time is 0.046369 seconds (0.000039 per host)
Aug 17 06:46:34 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=0][deviceId=0] getFirstHost()[address.c/54]
Aug 17 06:47:05 gbp last message repeated 461780 times
Aug 17 06:48:06 gbp last message repeated 996820 times
Aug 17 06:48:35 gbp last message repeated 475428 times
Aug 17 06:48:35 gbp ntop[23566]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1271 [out of 7362] hosts selected
Aug 17 06:48:35 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=0][deviceId=0] getFirstHost()[address.c/54]
Aug 17 06:48:35 gbp last message repeated 11 times
Aug 17 06:48:35 gbp ntop[23566]:  [MSGID8477291] IDLE_PURGE: Device 0 [eth1]: 1271/7361 hosts deleted, elapsed time is 0.050494 seconds (0.000040 per host)
Aug 17 06:48:35 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=0][deviceId=0] getFirstHost()[address.c/54]
Aug 17 06:48:57 gbp last message repeated 362506 times
Aug 17 06:48:57 gbp ntop[23566]:  [MSGID8757584] SFP: Ending fingerprint scan cycle 457 - checked 101, resolved 0
Aug 17 06:48:57 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=0][deviceId=0] getFirstHost()[address.c/54]
Aug 17 06:49:27 gbp last message repeated 501962 times
Aug 17 06:50:29 gbp last message repeated 963810 times
Aug 17 06:50:35 gbp last message repeated 105795 times
Aug 17 06:50:35 gbp ntop[23566]:  [MSGID0825709] IDLE_PURGE: Device 0 [eth1] FINISHED selection, 1559 [out of 6091] hosts selected
Aug 17 06:50:35 gbp ntop[23566]:  [MSGID8850278] **WARNING** Error: bad magic number [expected=1968/real=0][deviceId=0] getFirstHost()[address.c/54]
Aug 17 06:50:35 gbp last message repeated 11 times


This new version gives us some new information: the “bad magic number” message was triggered when address.c calls getFirstHost() at line 54.


Here is the “gdb stuff” you asked:

[EMAIL PROTECTED] root]# gdb /usr/local/ntop3cvs/bin/ntop 23566
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
...
<snip>
(gdb) info stack
#0  0xffffe002 in ?? ()
#1  0x400ebc1e in _ntopSleepWhileSameState (file=0x8052114 "main.c", line=588, uSeconds=4294967292) at util.c:3996
#2  0x0804ad2b in main (argc=4, argv=0xbffff7e4) at main.c:588
#3  0x42015704 in __libc_start_main () from /lib/tls/libc.so.6

(gdb) info threads
  9 Thread 1117223728 (LWP 23569)  0xffffe002 in ?? () <== THREADMGMT[t1117223728]: NPA: network packet analyzer (packet processor) thread running [p23566]
8 Thread 1125616432 (LWP 23570)  0xffffe002 in ?? () <== THREADMGMT[t1125616432]: SFP: Fingerprint scan thread running [p23566]
7 Thread 1134009136 (LWP 23571)  0xffffe002 in ?? () <== THREADMGMT[t1134009136]: SIH: Idle host scan thread running [p23566]
6 Thread 1142401840 (LWP 23572)  0xffffe002 in ?? () <== DNSAR(1): Address resolution thread running [p23566]
5 Thread 1150794544 (LWP 23574)  0xffffe002 in ?? () <== WEB: Server connection thread running [p23566]
4 Thread 1159416624 (LWP 23575)  0xffffe002 in ?? () <== THREADMGMT[t1159416624]: RRD: Data collection thread running [p23566]
3 Thread 1167809328 (LWP 23576)  0x4209edef in strftime () from /lib/tls/libc.so.6  <== THREADMGMT[t1167809328]: NPS(1,eth1): pcapDispatch thread running [p23566]
2 Thread 1179650864 (LWP 23584)  0xffffe002 in ?? () <== THREADMGMT[t1179650864]: RRD: Throughput data collection: Thread running [p23566]
1 Thread 1102176576 (LWP 23566)  0xffffe002 in ?? () <= THREADMGMT[t1102176576]: Now running as a daemon

(gdb) thread 9  <== the NPA thread
[Switching to thread 9 (Thread 1117223728 (LWP 23569))]#0  0xffffe002 in ?? ()
(gdb) info stack
#0  0xffffe002 in ?? ()
#1  0x400d36a6 in dequeuePacket (notUsed=0x0) at pbuf.c:2407
#2  0x4181a484 in start_thread () from /lib/tls/libpthread.so.0

(gdb) thread 3  <== the NPS thread
[Switching to thread 3 (Thread 1167809328 (LWP 23576))]#0  0x4209edef in strftime () from /lib/tls/libc.so.6
(gdb) info stack
#0  0x4209edef in strftime () from /lib/tls/libc.so.6
#1  0x4209f3ba in strftime () from /lib/tls/libc.so.6
#2  0x400e967b in traceEvent (eventTraceLevel=2, file=0x400fb08e "util.c", line=71,
    format=0x400fb880 "Error: bad magic number [expected=%d/real=%d][deviceId=%d] getFirstHost()[%s/%d]") at util.c:2529
#3  0x400e5ab5 in __getFirstHost (actualDeviceId=0, beginIdx=1, file=0x400f4420 "address.c", line=54) at util.c:71
#4  0x400bb7ad in updateDeviceHostNameInfo (addr=
              {hostFamily = 2, addr = {_hostIp4Address = {s_addr = 2228391807}, _hostIp6Address = {in6_u = {u6_addr8 = "\177\217?\204", '\0' <repeats 11 times>, u6_addr16 = {36735, 34002, 0, 0, 0, 0, 0, 0}, u6_addr32 = {2228391807, 0, 0, 0}}}}}, symbolic=0x459b3148 "132.210.143.127", actualDeviceId=0, type=19) at address.c:54
#5  0x400bb83e in updateHostNameInfo (addr=
              {hostFamily = 2, addr = {_hostIp4Address = {s_addr = 2228391807}, _hostIp6Address = {in6_u = {u6_addr8 = "\177\217?\204", '\0' <repeats 11 times>, u6_addr16 = {36735, 34002, 0, 0, 0, 0, 0, 0}, u6_addr32 = {2228391807, 0, 0, 0}}}}}, symbolic=0x459b3148 "132.210.143.127", type=19) at address.c:81
#6  0x400bc864 in ipaddr2str (hostIpAddress=
              {hostFamily = 2, addr = {_hostIp4Address = {s_addr = 2228391807}, _hostIp6Address = {in6_u = {u6_addr8 = "\177\217?\204", '\0' <repeats 11 times>, u6_addr16 = {36735, 34002, 0, 0, 0, 0, 0, 0}, u6_addr32 = {2228391807, 0, 0, 0}}}}}, updateHost=1) at address.c:871
#7  0x400c39a0 in _lookupHost (hostIpAddress=0x459b34f8, ether_addr=0x459b38c8 "", vlanId=-1, checkForMultihoming=1 '\001', forceUsingIPaddress=0 '\0',
    actualDeviceId=0, file=0x400f7eaf "pbuf.c", line=1088) at hash.c:1105
#8  0x400cf6cf in processIpPkt (bp=0x459b3926 "E", h=0x459b59a8, length=522, ether_src=0x459b38ce "", ether_dst=0x459b38c8 "", actualDeviceId=0, vlanId=-1)
    at pbuf.c:1088
#9  0x400d6a34 in processPacket (_deviceId=0x0, h=0x459b59a8, p=0x459b3918 "") at pbuf.c:3528
#10 0x400d336a in queuePacket (_deviceId=0x0, h=0x459b59a8, p=0x80c7c72 "") at pbuf.c:2317
#11 0x419a25aa in pcap_read_packet (handle=0x80c7ad0, callback=0x400d31bc <queuePacket>, userdata=0x0) at pcap-linux.c:638
#12 0x419a3b9a in pcap_loop (p=0x80c7ad0, cnt=-1, callback=0x400d31bc <queuePacket>, user=0x0) at pcap.c:81
#13 0x400ca442 in pcapDispatch (_i=0x0) at ntop.c:97
#14 0x4181a484 in start_thread () from /lib/tls/libpthread.so.0
(gdb)

      <snip>
_______________________________________________
Ntop-dev mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-dev

Reply via email to