RE: dlr-retry-queue patch

Stevenard, Kevin (Kevin) Mon, 23 Jul 2012 03:56:24 -0700

Hello Andreas,

The problem of the micro lag in the replication (I say micro because our setup 
is not overloaded and most of the time lag it is not exceeding 100ms) is not 
the main motivation of this patch. Even with a single DB server we still have 
to retry DLRs.

Our main problem is that we are connected to various SMSC / SMPP gateway and 
some of them have weird behavior for example few of them sends sometimes the 
deliver_sm before the submit_sm_resp ... for some of those gateways we have 
seen that it is always on some specific cases like the end-user is not 
available or not anymore reachable.

Some gateways also send deliver_sm in random / roundrobin / "load related" 
manner. Or start to be crazy when we send too much load even if we stay in the 
boundaries imposed by them.

Concerning the number of binds we are most of the time obliged to follow the 
operator policy, so if we should have 2 binds per server on two different SMSC 
with identical IDs then we do it... 

We also like to help operators and give feedbacks when their SMSC or gateway 
when not acting properly but most of the time they can't or don't want to fix 
those kind of issues.

For your information, even with high traffic or spike of load we don't use too 
much that retry queue (max of 300 / day, and most of them are successfully 
processed at the first retry) but each DLR positive or negative is valuable for 
us and needs to be processed.

-----Original Message-----
From: Andreas Fink [mailto:[email protected]] 
Sent: lundi 23 juillet 2012 12:10
To: Stevenard, Kevin (Kevin)
Cc: [email protected]
Subject: Re: dlr-retry-queue patch

Dear Kevin,

The DLR database entry is a temporary one. The Kannel instance which sends the 
submit_sm is the entity which also will get the delivery reports. So you should 
not have any issues in replication. The only reason why you would have this is 
if you have multiple kannel's on multiple machines connecting to the same smsc 
with the same user-id. The work around is simply to use different user-id's so 
the SMSC will route the delivery report back to the kannel which originally 
submitted the SMS. Then  every kannel can and should have his private DLR 
storage. Replication might be useful for backup purposes or failover but that's 
about it. Should one kannel fail, another kannel on another machine can be 
fired up with the failed kannel's username and database access to take over 
that load.

Maybe you explain better what you are trying to achieve...

On 23.07.2012, at 11:43, Stevenard, Kevin (Kevin) wrote:

> Hello Kannel users,
> 
> I have written a patch on the smsc_smpp connector to implement a DLR retry 
> queue. In our services we are highly dependent of delivery reports. We were 
> first using a patch 
> (http://www.blogalex.com/wp-content/uploads/2009/05/kannel-dlr-retry.patch) 
> but we were not happy with the following:
> - blocking io thread due to the sleep implemented at dlr search level
> - preventing instant deliver_sm_resp (due also to the sleep blocking io 
> thread)
> - system wide retry, can not be configured per account
> 
> As our setup is composed of several servers in a 'cluster' each server is 
> running Kannel with several smpp/emi binds and each server is writing 
> delivery reports in a local MySQL backend. We then use master-master 
> replication to spread dlr entries on all servers.
> As we have multiple servers and multiple binds its possible to get a dlr 
> through a deliver_sm before we received the submit_sm_resp containing the 
> message id or before the dlr entry is written and replicated in other db 
> servers (even with a lag < 1sec).

> 
> So the patch is adding a queue per smscconn and creates for each of them a 
> dedicated thread to re-process dlr that have not been found in the dlr store, 
> if not configured no queue and no thread are created. 
> 
> To configure it:
> dlr-retry-count: Number of attempts to process a delivery report if not found 
> in the delivery reports store. Defaults to 0 times (disabled).
> dlr-retry-interval: This timer specifies the interval time between delivery 
> reports retries. Defaults to 60 seconds.
> 
> #sample in a group = smsc && smsc = smpp:
> dlr-retry-interval = 60
> dlr-retry-count = 3
> 
> 
> <dlr-retry-queue.patch>

RE: dlr-retry-queue patch

Reply via email to