Hi Kevin,
beside the fact that SMSCs that sends DLR before ACK with message_id are just
plain buggy and have to fixed I would try for the
correctly implemented SMSCs to return temp error if DLR not found and they will
retry later.
For the patch itself here are some comments:
+ if ( smpp->dlr_retry_count > 0 ) {
+ // add the DLR into the retry queue
+ struct dlr_retry *retry = gw_malloc(sizeof(struct dlr_retry));
// is not C-Style comment
+static void do_dlr_retry_cleanup (SMPP *smpp)
+{
+ long i;
+ Msg *dlrmsg = NULL;
+ for (i = 0; i < gwlist_len(smpp->dlr_to_retry); i++) {
+ struct dlr_retry *dlr;
+ dlr = gwlist_get(smpp->dlr_to_retry, i);
+ if (dlr != NULL ) {
+ if ( difftime(time(NULL),dlr->retry_time) >= 0 ) {
+ debug("bb.sms.smpp", 0,"SMPP[%s]: Attempt <%ld> of <%ld> to
process DLR id<%s> dst<%s>, type<%d>",
+ octstr_get_cstr(smpp->conn->id),
+ (dlr->dlr_retry_count+1), smpp->dlr_retry_count,
+ octstr_get_cstr(dlr->msgid),
+ octstr_get_cstr(dlr->destination_address),
dlr->dlrstat);
+ dlrmsg = dlr_find(smpp->conn->id,
+ dlr->msgid, /* smsc message id */
+ dlr->destination_address, /* destination */
+ dlr->dlrstat, 1);
+ if (dlrmsg != NULL) {
+ debug("bb.sms.smpp", 0,"SMPP[%s]: Found matching DLR
id<%s> dst<%s>, type<%d>",
+ octstr_get_cstr(smpp->conn->id),
octstr_get_cstr(dlr->msgid),
+ octstr_get_cstr(dlr->destination_address),
dlr->dlrstat);
+ dlrmsg->sms.msgdata = octstr_duplicate(dlr->msgdata);
+ dlrmsg->sms.sms_type = report_mo;
+ dlrmsg->sms.account = octstr_duplicate(smpp->username);
+ bb_smscconn_receive(smpp->conn, dlrmsg);
+ gwlist_delete(smpp->dlr_to_retry,i,1);
you don't adjust "i" therefore you will miss the next DLR in the queue
+ } else {
+ dlr->dlr_retry_count++;
+ if (dlr->dlr_retry_count >= smpp->dlr_retry_count) {
+ gwlist_delete(smpp->dlr_to_retry,i,1);
ditto
+ error(0,"SMPP[%s]: Number of retry excedeed for DLR "
+ "id<%s> dst<%s>, type<%d> discarding it",
+ octstr_get_cstr(smpp->conn->id),
octstr_get_cstr(dlr->msgid),
+ octstr_get_cstr(dlr->destination_address),
dlr->dlrstat);
+ } else {
+ dlr->retry_time = time(NULL) +
smpp->dlr_retry_interval;
+ debug("bb.sms.smpp", 0,"SMPP[%s]: DLR not yet found
re-queueing it id<%s> dst<%s>, type<%d>",
+ octstr_get_cstr(smpp->conn->id),
octstr_get_cstr(dlr->msgid),
+ octstr_get_cstr(dlr->destination_address),
dlr->dlrstat);
+ }
+ }
+ }
+ }
+ }
+}
+ * execute in a loop do_dlr_retry_cleanup and wait 1sec till the smpp conn is
+ * not quitting
+ */
+static void dlr_retry_thread(void *arg)
+{
+ SMPP *smpp;
+ smpp = arg;
+ /* Make sure we log into our own log-file if defined */
+ log_thread_to(smpp->conn->log_idx);
+ while (!smpp->quitting) {
+ do_dlr_retry_cleanup(smpp);
+ gwthread_sleep(1.0);
+ }
+}
Please make retry frequency configurable I think 1 sec is too low for some
configurations when DLR DB should be busy.
I see you have already dlr_retry_interval why not return value how long to
sleep?
If we will shutdown the bearerbox then we will missing following error because
you just drop waiting DLRs without notice:
+ error(0,"SMPP[%s]: got DLR but could not find message or was not
interested "
+ "in it id<%s> dst<%s>, type<%d>",
+ octstr_get_cstr(smpp->conn->id), octstr_get_cstr(tmp),
+ octstr_get_cstr(destination_addr), dlrstat);
Thanks,
Alex
Am 23.07.2012 um 12:55 schrieb "Stevenard, Kevin (Kevin)"
<[email protected]>:
> Hello Andreas,
>
> The problem of the micro lag in the replication (I say micro because our
> setup is not overloaded and most of the time lag it is not exceeding 100ms)
> is not the main motivation of this patch. Even with a single DB server we
> still have to retry DLRs.
>
> Our main problem is that we are connected to various SMSC / SMPP gateway and
> some of them have weird behavior for example few of them sends sometimes the
> deliver_sm before the submit_sm_resp ... for some of those gateways we have
> seen that it is always on some specific cases like the end-user is not
> available or not anymore reachable.
>
> Some gateways also send deliver_sm in random / roundrobin / "load related"
> manner. Or start to be crazy when we send too much load even if we stay in
> the boundaries imposed by them.
>
> Concerning the number of binds we are most of the time obliged to follow the
> operator policy, so if we should have 2 binds per server on two different
> SMSC with identical IDs then we do it...
>
> We also like to help operators and give feedbacks when their SMSC or gateway
> when not acting properly but most of the time they can't or don't want to fix
> those kind of issues.
>
> For your information, even with high traffic or spike of load we don't use
> too much that retry queue (max of 300 / day, and most of them are
> successfully processed at the first retry) but each DLR positive or negative
> is valuable for us and needs to be processed.
>
> -----Original Message-----
> From: Andreas Fink [mailto:[email protected]]
> Sent: lundi 23 juillet 2012 12:10
> To: Stevenard, Kevin (Kevin)
> Cc: [email protected]
> Subject: Re: dlr-retry-queue patch
>
> Dear Kevin,
>
> The DLR database entry is a temporary one. The Kannel instance which sends
> the submit_sm is the entity which also will get the delivery reports. So you
> should not have any issues in replication. The only reason why you would have
> this is if you have multiple kannel's on multiple machines connecting to the
> same smsc with the same user-id. The work around is simply to use different
> user-id's so the SMSC will route the delivery report back to the kannel which
> originally submitted the SMS. Then every kannel can and should have his
> private DLR storage. Replication might be useful for backup purposes or
> failover but that's about it. Should one kannel fail, another kannel on
> another machine can be fired up with the failed kannel's username and
> database access to take over that load.
>
> Maybe you explain better what you are trying to achieve...
>
>
>
> On 23.07.2012, at 11:43, Stevenard, Kevin (Kevin) wrote:
>
>> Hello Kannel users,
>>
>> I have written a patch on the smsc_smpp connector to implement a DLR retry
>> queue. In our services we are highly dependent of delivery reports. We were
>> first using a patch
>> (http://www.blogalex.com/wp-content/uploads/2009/05/kannel-dlr-retry.patch)
>> but we were not happy with the following:
>> - blocking io thread due to the sleep implemented at dlr search level
>> - preventing instant deliver_sm_resp (due also to the sleep blocking io
>> thread)
>> - system wide retry, can not be configured per account
>>
>> As our setup is composed of several servers in a 'cluster' each server is
>> running Kannel with several smpp/emi binds and each server is writing
>> delivery reports in a local MySQL backend. We then use master-master
>> replication to spread dlr entries on all servers.
>> As we have multiple servers and multiple binds its possible to get a dlr
>> through a deliver_sm before we received the submit_sm_resp containing the
>> message id or before the dlr entry is written and replicated in other db
>> servers (even with a lag < 1sec).
>
>>
>> So the patch is adding a queue per smscconn and creates for each of them a
>> dedicated thread to re-process dlr that have not been found in the dlr
>> store, if not configured no queue and no thread are created.
>>
>> To configure it:
>> dlr-retry-count: Number of attempts to process a delivery report if not
>> found in the delivery reports store. Defaults to 0 times (disabled).
>> dlr-retry-interval: This timer specifies the interval time between delivery
>> reports retries. Defaults to 60 seconds.
>>
>> #sample in a group = smsc && smsc = smpp:
>> dlr-retry-interval = 60
>> dlr-retry-count = 3
>>
>>
>> <dlr-retry-queue.patch>
>
>
>