I forgot something.
In my conversation with Arne and until he replies, he mentioned that in his
case it was a push (MT) and never lost a message. Just queued for
laterdelivery.
Which would make a lot of sense with the mechanism you propose.
Nikos
----- Original Message -----
From: "Nikos Balkanas" <[email protected]>
To: "Stipe Tolj" <[email protected]>
Cc: <[email protected]>
Sent: Saturday, January 10, 2009 8:08 PM
Subject: Re: Kannel 1.4.2 out
Hi,
Sorry for budding in, but this is very interesting conversation and
couldn't resist.
I imagine that in this case you don't get SIGPIPE from trying to write to
a closed socket. But if you check the bytes written you should be getting
an error. Alarms may be needed because i smell timeouts on the socket.
This could provide for a rollback procedure to reclaim the message and
reset the state to offline. Of course this will be only a quick workaround
untilI states are fixed, since I anticipate that the receiver will reset
it to online, and transmitter offline again.
Is this something feasible for 1.4.2?
Nikos
----- Original Message -----
From: "Stipe Tolj" <[email protected]>
Cc: <[email protected]>
Sent: Saturday, January 10, 2009 7:18 PM
Subject: Re: Kannel 1.4.2 out
Alexander Malysh schrieb:
As far as I know it only happens on SMPP connections with receiver and
sender thread (not transceiver). Is it correct?
If yes, then we need to investigate this bug first or disable 2 Threads
(sender and receiver) and force users to use 2 SMPP connection groups
(one for receiver and one for sender).
yep, I can confirm this as seen on some production (high load) systems.
The effect can be described as following:
- A SMPP connection that uses 'port' and 'receive-port' in the smsc
group.
- Which means we have 1 TCP connection for the transmitter session, and 1
TCP
connection for the receiver session.
- Due to architecture design, a "smsc" can have only 1 state. Normally
this
would be "online" internally, so the abstraction layer of bearerbox will
consider in it's routing decision this as a valid route.
- Now, the magic part: We get a so called "silent TCP teardrop", which
means the
TCP connections are "semantically disconnected" by a middle router, but
the TCP
end-points (server, client) don't get a corresponding TCP drop packet. In
this
state we (Kannel) "believes" we're still connected. If we use
enquire_link PDUs
to ensure we "are" connected, we will notice we're not and drop the TCP
connections, trying to re-establish the connection.
- The "fun" thing is: if the TCP teardrop happens only for the
transmitter
session, the receiver session will still keep getting enquire_link_resp
PDUs
back from the SMSC side and keep the overall module state in "online",
and hence
the transmitter part is NOT re-establishing the connection.
- The result: we end up in a transmitter session that seems "online" from
the
perspective of the abstractive bearerbox layer, so it WILL route MTs this
way,
and we keep pushing them into the "lost sink".
What we need to check is the state handling here.
@Arne: is this pretty much the behavior you have also being faced with?
Stipe
--
-------------------------------------------------------------------
KΓ¶lner Landstrasse 419
40589 DΓΌsseldorf, NRW, Germany
tolj.org system architecture Kannel Software Foundation (KSF)
http://www.tolj.org/ http://www.kannel.org/
mailto:st_{at}_tolj.org mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------