I've been testing Alex's TLV patch with the meta-data branch,
with
initial success (able to read and set mblox TLVs for US-specific
bind, as per Kyriacos's mblox TLV config - thanks for that!).
It's
been performing perfectly on our test server, with a small number
of test binds, carrying test traffic.
However, when I decided to try deploying that build to
production,
bearerbox would crash shortly after startup (after varying delay,
sometimes 1 second, sometimes 10 or so). It would segfault,
with no
indication in bearer.log (just a 'Connection closed by the
bearerbox' in smsbox.log) - the only indication was in /var/log/
messages, for example:
Jan 21 23:46:27 smsgw2 kernel: bearerbox[10180]: segfault at
0000000000000118 rip 000000000044c229 rsp 000000005ea30060
error 4
I initially thought it might be a 32-bit/64-bit thing (test
server
is 32-bit, production is 64-bit), but I couldn't reproduce the
problem on another 64-bit machine, and recompiling with gcc4
didn't
help. (gcc 3.4 apparently can produce incorrect instructions
on 64-
bit machines in some rare cases??) Turning on debug logging
didn't
show anything useful either.
I recompiled kannel with --with-defaults=debug, and tried
attaching
gdb to bearerbox to see if I could get anything useful when it
segfaulted - here's the output from gdb:
...
2008-01-21 23:59:28 [18588] [49] DEBUG: SMPP[optusfrmt]: Got PDU:
2008-01-21 23:59:28 [18588] [49] DEBUG: SMPP PDU 0x6bcff60 dump:
2008-01-21 23:59:28 [18588] [49] DEBUG: type_name: deliver_sm
2008-01-21 23:59:28 [18588] [49] DEBUG: command_id: 5 =
0x00000005
2008-01-21 23:59:28 [18588] [49] DEBUG: command_status: 0 =
0x00000000
2008-01-21 23:59:28 [18588] [49] DEBUG: sequence_number: 1 =
0x00000001
2008-01-21 23:59:28 [18588] [49] DEBUG: service_type: "NOREP"
2008-01-21 23:59:28 [18588] [49] DEBUG: source_addr_ton: 1 =
0x00000001
2008-01-21 23:59:28 [18588] [49] DEBUG: source_addr_npi: 1 =
0x00000001
2008-01-21 23:59:28 [18588] [49] DEBUG: source_addr: "XXXX"
2008-01-21 23:59:28 [18588] [49] DEBUG: dest_addr_ton: 2 =
0x00000002
2008-01-21 23:59:28 [18588] [49] DEBUG: dest_addr_npi: 8 =
0x00000008
2008-01-21 23:59:28 [18588] [49] DEBUG: destination_addr:
"19774777"
2008-01-21 23:59:28 [18588] [49] DEBUG: esm_class: 4 =
0x00000004
2008-01-21 23:59:28 [18588] [49] DEBUG: protocol_id: 0 =
0x00000000
2008-01-21 23:59:28 [18588] [49] DEBUG: priority_flag: 0 =
0x00000000
2008-01-21 23:59:28 [18588] [49] DEBUG:
schedule_delivery_time: NULL
2008-01-21 23:59:28 [18588] [49] DEBUG: validity_period: NULL
2008-01-21 23:59:28 [18588] [49] DEBUG: registered_delivery:
0 =
0x00000000
2008-01-21 23:59:28 [18588] [49] DEBUG:
replace_if_present_flag:
0 = 0x00000000
2008-01-21 23:59:28 [18588] [49] DEBUG: data_coding: 0 =
0x00000000
2008-01-21 23:59:28 [18588] [49] DEBUG: sm_default_msg_id: 0 =
0x00000000
2008-01-21 23:59:28 [18588] [49] DEBUG: sm_length: 122 =
0x0000007a
2008-01-21 23:59:28 [18588] [49] DEBUG: short_message:
2008-01-21 23:59:28 [18588] [49] DEBUG: Octet string at
0x6be5f30:
2008-01-21 23:59:28 [18588] [49] DEBUG: len: 122
2008-01-21 23:59:28 [18588] [49] DEBUG: size: 123
2008-01-21 23:59:28 [18588] [49] DEBUG: immutable: 0
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 69 64 3a 31 34
32 37 31 35 39 33 37 36 20 73 75 id:1427159376 su
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 62 3a 30 30 31
20 64 6c 76 72 64 3a 30 30 31 20 b:001 dlvrd:001
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 73 75 62 6d 69
74 20 64 61 74 65 3a 30 38 30 31 submit date:0801
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 32 31 32 33 35
39 20 64 6f 6e 65 20 64 61 74 65 212359 done date
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 3a 30 38 30 31
32 31 32 33 35 39 20 73 74 61 74 :0801212359 stat
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 3a 44 45 4c 49
56 52 44 20 65 72 72 3a 30 30 30 :DELIVRD err:000
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 20 74 65 78 74
3a 43 68 20 37 3a 20 54 68 6e 78 text:Ch 7: Thnx
2008-01-21 23:59:28 [18588] [49] DEBUG: data: 20 34 20 65 6e
74 65 72 69 6e 4 enterin
2008-01-21 23:59:28 [18588] [49] DEBUG: Octet string dump
ends.
2008-01-21 23:59:28 [18588] [49] DEBUG: SMPP PDU dump ends.
2008-01-21 23:59:28 [18588] [49] DEBUG: SMPP[optusfrmt]
handle_pdu,
got DLR
2008-01-21 23:59:28 [18588] [49] DEBUG: DLR[pgsql]: Looking
for DLR
smsc=optusfrmt, ts=1427159376, dst=XXXX, type=1
2008-01-21 23:59:28 [18588] [49] DEBUG: sql: SELECT mask,
service,
url, source, destination, boxc FROM dlr WHERE smsc='optusfrmt'
AND
ts='1427159376' LIMIT 1;
2008-01-21 23:59:28 [18588] [49] DEBUG: Found entry, col1=31,
col2=apg, col3=http://apg:8888/dlr/kannel?i=44921898&t=%T&c=%
d&m=%
A, col4=19774777, col5=XXXX col6=
2008-01-21 23:59:28 [18588] [49] DEBUG: DLR[pgsql]: created DLR
message for URL <http://apg:8888/dlr/kannel?i=44921898&t=%T&c=%
d&m=%A>
2008-01-21 23:59:28 [18588] [49] DEBUG: removing DLR from
database
2008-01-21 23:59:28 [18588] [49] DEBUG: sql: DELETE FROM dlr
WHERE
smsc='optusfrmt' AND ts='1427159376';
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1577253216 (LWP 18639)]
0x000000000044ebc1 in handle_pdu (smpp=0x2a95c0e3d0,
conn=0x6b4f8c0,
pdu=0x6bcff60, pending_submits=0x5e02f0f0) at gw/smsc/
smsc_smpp.c:1460
1460 meta_data_set_values(msg->sms.meta_data,
pdu->u.deliver_sm.tlv, "smpp");
So, it seems the segfault is triggered by TLV handling for a
certain kind of DLR? (We had DLRs coming in on our test binds,
and
that didn't cause a problem). I'm a complete GDB noob, so if
there's anything else I can do to provide more information,
please
let me know.
Any ideas why that meta_data_set_value function call would die
with
that DLR? Any assistance would be greatly appreciated!
FYI, Kannel details are:
Kannel bearerbox version `cvs-20071018'. Build `Jan 21 2008
23:52:11', compiler `3.4.6 20060404 (Red Hat 3.4.6-9)'. System
Linux, release 2.6.9-55.0.2.ELsmp, version #1 SMP Tue Jun 26
14:14:47 EDT 2007, machine x86_64. Hostname
smsgw2.appgw.mnetcorporation.com, IP 10.110.123.31. Libxml
version
2.6.16. Using checking malloc.
Thanks,
--
Giulio Harding
Systems Administrator
m.Net Corporation
Level 2, 8 Leigh Street
Adelaide SA 5000, Australia
Tel: +61 8 8210 2041
Fax: +61 8 8211 9620
Mobile: 0432 876 733
Yahoo: giulio.harding
MSN: [EMAIL PROTECTED]
http://www.mnetcorporation.com