Hi,
Sorry for budding in, but this is very interesting conversation and couldn't
resist.
I imagine that in this case you don't get SIGPIPE from trying to write to a
closed socket. But if you check the bytes written you should be getting an
error. Alarms may be needed because i smell timeouts on the socket. This
could provide for a rollback procedure to reclaim the message and reset the
state to offline. Of course this will be only a quick workaround untilI
states are fixed, since I anticipate that the receiver will reset it to
online, and transmitter offline again.
Is this something feasible for 1.4.2?
Nikos
----- Original Message -----
From: "Stipe Tolj" <[email protected]>
Cc: <[email protected]>
Sent: Saturday, January 10, 2009 7:18 PM
Subject: Re: Kannel 1.4.2 out
Alexander Malysh schrieb:
As far as I know it only happens on SMPP connections with receiver and
sender thread (not transceiver). Is it correct?
If yes, then we need to investigate this bug first or disable 2 Threads
(sender and receiver) and force users to use 2 SMPP connection groups
(one for receiver and one for sender).
yep, I can confirm this as seen on some production (high load) systems.
The effect can be described as following:
- A SMPP connection that uses 'port' and 'receive-port' in the smsc group.
- Which means we have 1 TCP connection for the transmitter session, and 1
TCP
connection for the receiver session.
- Due to architecture design, a "smsc" can have only 1 state. Normally
this
would be "online" internally, so the abstraction layer of bearerbox will
consider in it's routing decision this as a valid route.
- Now, the magic part: We get a so called "silent TCP teardrop", which
means the
TCP connections are "semantically disconnected" by a middle router, but
the TCP
end-points (server, client) don't get a corresponding TCP drop packet. In
this
state we (Kannel) "believes" we're still connected. If we use enquire_link
PDUs
to ensure we "are" connected, we will notice we're not and drop the TCP
connections, trying to re-establish the connection.
- The "fun" thing is: if the TCP teardrop happens only for the transmitter
session, the receiver session will still keep getting enquire_link_resp
PDUs
back from the SMSC side and keep the overall module state in "online", and
hence
the transmitter part is NOT re-establishing the connection.
- The result: we end up in a transmitter session that seems "online" from
the
perspective of the abstractive bearerbox layer, so it WILL route MTs this
way,
and we keep pushing them into the "lost sink".
What we need to check is the state handling here.
@Arne: is this pretty much the behavior you have also being faced with?
Stipe
--
-------------------------------------------------------------------
KΓ¶lner Landstrasse 419
40589 DΓΌsseldorf, NRW, Germany
tolj.org system architecture Kannel Software Foundation (KSF)
http://www.tolj.org/ http://www.kannel.org/
mailto:st_{at}_tolj.org mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------