Hi Alan,
I wholeheartedly agree. Consider the case that it cannot find a specific
DLR, everything else (DB, etc.) being fine, it shouldn't panic. This is the
normal behaviour with the other db drivers (not to panic), when people have
wrong msg-id-type, or dlr entry is lost upon bb restart, or SMSc sends in
DLR before the submit_sm_resp with the DLR-id.
The statement:
if (result == NULL...)
{
while(row = gwlist_extract_first(result)) -> lock(result)
it doesn't make any sense at all. The faster it is replaced the better.
+1
BR,
Nikos
----- Original Message -----
From: "Alan McNatty" <a...@catalyst.net.nz>
To: "Nikos Balkanas" <nbalka...@gmail.com>
Cc: <devel@kannel.org>
Sent: Wednesday, August 10, 2011 12:08 PM
Subject: Re: panic inducing use of gwlist_extract_first in dlr_pgsql.c
Hi Nikos,
So do you agree that we should avoid panic'ing as a result of a
temporary situation (as outlined with db connection dropping)? That is -
the patch is good?
Cheers,
Alan
On Wed, 2011-08-10 at 11:03 +0300, Nikos Balkanas wrote:
That is a well known behavior. Bb crashes and stops responding to the
heartbeats that smsbox sends. As a result, smsbox logs in "bearerbox
gone, shutting down" and shuts down. The parent bb process should
handle heartbeats, not the child.
HTH,
Nikos
On Wed, Aug 10, 2011 at 9:56 AM, Alan McNatty <a...@catalyst.net.nz>
wrote:
Also note: a side effect of the current behaviour (panic when
DB
temporarily unavailable) when --parachute used at start-up is
that
smsbox will be shutdown as a result of the panic but bearerbox
will come
back online when the DB is available again (as the --parachute
keeps it
alive). The result being a running bearerbox without any
smsbox(es)
attached.
On Wed, 2011-08-10 at 16:51 +1200, Alan McNatty wrote:
> Hi All,
>
> I'm finding what I think is incorrect use of
gwlist_extract_first in the
> postgres dlr implementations (it may also exist in others -
I've not
> checked yet). The DLR methods issue 'error's when they fail
to return
> results, etc but subsequent calls to gwlist_extract_first on
NULL lists
> cause 'panic's.
>
> What I'm testing is the situation when the DLR DB is
available on
> start-up (we panic if it is not). If during during normal
operation the
> database is shutdown or temporarily unavailable (network
issue, etc).
> The select fail is an error but results in a panic.
>
> 2011-08-10 16:37:43 [18552] [3] ERROR: PGSQL: SELECT
count(*) FROM
> "dlr";
> 2011-08-10 16:37:43 [18552] [3] ERROR: PGSQL: FATAL:
terminating
> connection due to administrator command
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> 2011-08-10 16:37:43 [18552] [3] ERROR: PGSQL: Select failed!
> 2011-08-10 16:37:43 [18552] [3] ERROR: PGSQL: Could not get
count of DLR
> table
> 2011-08-10 16:37:43 [18552] [3] PANIC: gwlib/list.c:309:
> gwlist_extract_first: Assertion `list != NULL' failed.
> 2011-08-10 16:37:43 [18552] [3]
PANIC: /usr/sbin/bearerbox(gw_panic
> +0x14b) [0x48b55b]
> 2011-08-10 16:37:43 [18552] [3]
> PANIC: /usr/sbin/bearerbox(gwlist_extract_first+0x94)
[0x489874]
> 2011-08-10 16:37:43 [18552] [3] PANIC: /usr/sbin/bearerbox
[0x41e3d3]
> 2011-08-10 16:37:43 [18552] [3]
> PANIC: /usr/sbin/bearerbox(bb_print_status+0x11d) [0x40edfd]
> 2011-08-10 16:37:43 [18552] [3] PANIC: /usr/sbin/bearerbox
[0x415075]
> 2011-08-10 16:37:43 [18552] [3] PANIC: /usr/sbin/bearerbox
[0x4823cf]
> 2011-08-10 16:37:43 [18552] [3] PANIC: /lib/libpthread.so.0
> [0x2b0e670a9fc7]
> 2011-08-10 16:37:43 [18552] [3] PANIC: /lib/libc.so.6(clone
+0x6d)
> [0x2b0e67a8664d]
>
> The attached patch addresses this (for postgres
implementation only - I
> can check the others if required). Once applied The result
on the status
> page is ..
>
> DLR: -1 queued, using pgsql storage
>
> And when a DLR is received ...
>
> 2011-08-10 16:44:53 [18889] [11] ERROR: PGSQL: FATAL:
terminating
> connection due to administrator command
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> 2011-08-10 16:44:53 [18889] [11] ERROR: PGSQL: Select
failed!
> 2011-08-10 16:44:53 [18889] [11] DEBUG: no rows found
> 2011-08-10 16:44:53 [18889] [11] WARNING: DLR[pgsql]: DLR
from SMSC<FOO>
> for DST<02xxxxxxxxx> not found.
> 2011-08-10 16:44:53 [18889] [11] ERROR: SMPP[FOO]: got DLR
but could not
> find message or was not interested in it id<534001841355>
> dst<02xxxxxxxxx>, type<1>
>
> Cheers,
> Alan