Hi Dante, I just finished a 130000 MO (no DLR) run under valgrind. It took almost forever. I used storefile to replicate conditions. I first disabled smsbox, until all messages ended up in the queue (30 MB). Then I started smsbox until all SMS were processed. I ended up using drive_smpp as the smsc emulator.
I wasn't able to replicate error. Valgrind came back clean. System was a SUSE 10.1 32 bit Linux on a desktop PC. You might want to put some print statements (info) in the code at the problem, to get at least more information next time it happens. I would try spool storage. BR, Nikos ----- Original Message ----- From: Dante Moreno To: Nikos Balkanas Cc: [email protected] Sent: Friday, September 11, 2009 4:34 PM Subject: Re: PANIC bearerbox cvs-20090902 Hi Nikos, I can't use spool store-type right now since kannel runs on ext3 filesystem. I remember reading that there were performance issues if you have a large queue+spool+ext3. If I have no other choice, I can partition the system and create an ext2 or xfs partition just for the queue. However I want to do that as a last resource solution(and hope that the problem really doesn't happen again). On the other hand, there are a couple of good free smpp smsc simulators. SMPPSim(free and open source) or Logica's simulator for example. Regards, Dante 2009/9/11 Nikos Balkanas <[email protected]> Hi, I cannot find anything wrong with the code at that point. However, it looks like memory corruption. Could you please use spool instead of file? It is safer, more efficient and faster than file. In addition it uses different memory structures than file and you should get away with it. I will update and run valgrind on it over the weekend. Unfortunately, I don't have smsc connections, but I hope I can catch the problem with fake smsc. If not, someone else from the list will have to look at it. BR, Nikos ----- Original Message ----- From: Nikos Balkanas To: Dante Moreno Cc: [email protected] Sent: Thursday, September 10, 2009 10:44 PM Subject: Re: PANIC bearerbox cvs-20090902 Hi, How can you say they are the same? Even the the problem is different this time. Anyway I 'll have to look at it. BR, Nikos ----- Original Message ----- From: Dante Moreno To: Nikos Balkanas Cc: [email protected] Sent: Thursday, September 10, 2009 8:12 PM Subject: Re: PANIC bearerbox cvs-20090902 Hi Nikos, Thanks for answering. First of all, i'm using the file store-type option and there is plenty of free disk space. I'm using the latest CVS(cvs-20090902). Here are the logs+addr2line output: 2009-09-10 09:10:40 [27966] [18] PANIC: gwlib/octstr.c:2484: seems_valid_real: Assertion `ostr != NULL' failed. (Called from gwlib/octstr.c:874:octstr_compare.) 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox(gw_panic+0x15b) [0x4833db] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox [0x483c59] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox(octstr_compare+0x20) [0x488800] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox [0x477292] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox(gwlist_search+0x54) [0x4811d4] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox(dict_get+0x35) [0x4772d5] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox [0x4175f0] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox [0x4177ef] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox [0x417df5] 2009-09-10 09:10:40 [27966] [18] PANIC: bearerbox [0x47a2f5] 2009-09-10 09:10:40 [27966] [18] PANIC: /lib64/libpthread.so.0 [0x3781e06307] 2009-09-10 09:10:40 [27966] [18] PANIC: /lib64/libc.so.6(clone+0x6d) [0x37812d1ded] and here the addr2line output: addr2line -e /gateway-1.4.3_cvs_20090902/gw/bearerbox 0x4833db 0x483c59 0x488800 0x477292 0x4811d4 0x4772d5 0x4175f0 0x4177ef 0x417df5 0x47a2f5 0x3781e06307 0x37812d1ded /gateway-1.4.3_cvs_20090902/gwlib/log.c:541 /gateway-1.4.3_cvs_20090902/gwlib/octstr.c:2483 /gateway-1.4.3_cvs_20090902/gwlib/octstr.c:875 /gateway-1.4.3_cvs_20090902/gwlib/dict.c:103 /gateway-1.4.3_cvs_20090902/gwlib/list.c:472 /gateway-1.4.3_cvs_20090902/gwlib/dict.c:298 /gateway-1.4.3_cvs_20090902/gw/bb_store_file.c:196 /gateway-1.4.3_cvs_20090902/gw/bb_store_file.c:571 /gateway-1.4.3_cvs_20090902/gw/bb_store_file.c:236 /gateway-1.4.3_cvs_20090902/gwlib/gwthread-pthread.c:135 ??:0 ??:0 The line numbers seem to be the same as before. Regards, Dante 2009/9/10 Nikos Balkanas <[email protected]> Hi, No, this is the right place for debugger info. First make sure that your partition is not getting full and kannel has space to write the Q. Seems you are using spool type for Q storage and it runs out of unique hash strings. But I cannot be sure, since your addr2line output is from an older CVS and reports wrong line numbers. Please update to latest CVS and repost. BR, Nikos ----- Original Message ----- From: Dante Moreno To: [email protected] Sent: Thursday, September 10, 2009 5:20 PM Subject: Re: PANIC bearerbox cvs-20090902 Maybe I should post this to the users list? We are now facing this problem on a daily basis. Any help would be greatly appreciated. Regards, Dante 2009/9/8 Dante Moreno <[email protected]> Hi, We are using the latest CVS and have found this PANIC bugs. This has happened to us 3 times in around two weeks. We are not able to reproduce them....the only thing we know is that it seems to happen when the store size is very large(100,000+ messages). We are using the "file" store type. Below are the bug reports: The first one is: 2009-08-14 12:29:12 [4472] [15] DEBUG: boxc_receiver: sms received 2009-08-14 12:29:13 [4472] [14] PANIC: gwlib/octstr.c:2505: seems_valid_real: Assertion `ostr->data[ostr->len] == '\0'' failed. (Called from gwlib/octstr.c:343:octstr_len.) 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox(gw_panic+0x15b) [0x4830db] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox [0x4837a5] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox(octstr_len+0x1f) [0x483aef] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox(octstr_hash_key+0x2f) [0x483b8f] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox [0x476e8c] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox(dict_get+0x1c) [0x476fbc] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox [0x4175b0] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox [0x4177af] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox [0x417db5] 2009-08-14 12:29:13 [4472] [14] PANIC: bearerbox [0x479ff5] 2009-08-14 12:29:13 [4472] [14] PANIC: /lib64/libpthread.so.0 [0x3781e06307] 2009-08-14 12:29:13 [4472] [14] PANIC: /lib64/libc.so.6(clone+0x6d) [0x37812d1ded] addr2line -e /gateway-1.4.3/gw/bearerbox 0x4830db 0x4837a5 0x483aef 0x483b8f 0x476e8c 0x476fbc 0x4175b0 0x4177af 0x417db5 0x479ff5 0x3781e06307 0x37812d1ded /gateway-1.4.3/gwlib/log.c:541 /gateway-1.4.3/gwlib/octstr.c:2507 /gateway-1.4.3/gwlib/octstr.c:344 /gateway-1.4.3/gwlib/octstr.c:2468 /gateway-1.4.3/gwlib/dict.c:139 /gateway-1.4.3/gwlib/dict.c:294 /gateway-1.4.3/gw/bb_store_file.c:196 /gateway-1.4.3/gw/bb_store_file.c:571 /gateway-1.4.3/gw/bb_store_file.c:236 /gateway-1.4.3/gwlib/gwthread-pthread.c:135 ??:0 ??:0 And the second one which happened today: 2009-09-08 09:56:01 [25766] [18] PANIC: gwlib/octstr.c:2484: seems_valid_real: Assertion `ostr != NULL' failed. (Called from gwlib/octstr.c:874:octstr_compare.) 2009-09-08 09:56:02 [25766] [21] DEBUG: boxc_receiver: sms received 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox(gw_panic+0x15b) [0x4833db] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox [0x483c59] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox(octstr_compare+0x20) [0x488800] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox [0x477292] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox(gwlist_search+0x54) [0x4811d4] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox(dict_get+0x35) [0x4772d5] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox [0x4175f0] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox [0x4177ef] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox [0x417df5] 2009-09-08 09:56:02 [25766] [18] PANIC: bearerbox [0x47a2f5] 2009-09-08 09:56:02 [25766] [18] PANIC: /lib64/libpthread.so.0 [0x3781e06307] 2009-09-08 09:56:02 [25766] [18] PANIC: /lib64/libc.so.6(clone+0x6d) [0x37812d1ded] addr2line -e gateway-1.4.3_cvs_20090902/gw/bearerbox 0x4833db 0x483c59 0x488800 0x477292 0x4811d4 0x4772d5 0x4175f0 0x4177ef 0x417df5 0x47a2f5 0x3781e06307 0x37812d1ded gateway-1.4.3_cvs_20090902/gwlib/log.c:541 gateway-1.4.3_cvs_20090902/gwlib/octstr.c:2483 gateway-1.4.3_cvs_20090902/gwlib/octstr.c:875 gateway-1.4.3_cvs_20090902/gwlib/dict.c:103 gateway-1.4.3_cvs_20090902/gwlib/list.c:472 gateway-1.4.3_cvs_20090902/gwlib/dict.c:298 gateway-1.4.3_cvs_20090902/gw/bb_store_file.c:196 gateway-1.4.3_cvs_20090902/gw/bb_store_file.c:571 gateway-1.4.3_cvs_20090902/gw/bb_store_file.c:236 gateway-1.4.3_cvs_20090902/gwlib/gwthread-pthread.c:135 ??:0 ??:0 Also, for some strange reason, after the PANIC bearerbox restarts itself(parachute) but smsbox doesn't. Could anybody please hint me in how to solve this issues? Regards, Dante
