Hi,

Sorry for budding in, but this is very interesting conversation and couldn't resist.

I imagine that in this case you don't get SIGPIPE from trying to write to a closed socket. But if you check the bytes written you should be getting an error. Alarms may be needed because i smell timeouts on the socket. This could provide for a rollback procedure to reclaim the message and reset the state to offline. Of course this will be only a quick workaround untilI states are fixed, since I anticipate that the receiver will reset it to online, and transmitter offline again.

Is this something feasible for 1.4.2?
Nikos
----- Original Message ----- From: "Stipe Tolj" <[email protected]>
Cc: <[email protected]>
Sent: Saturday, January 10, 2009 7:18 PM
Subject: Re: Kannel 1.4.2 out


Alexander Malysh schrieb:

As far as I know it only happens on SMPP connections with receiver and
sender thread (not transceiver). Is it correct?
If yes, then we need to investigate this bug first or disable 2 Threads
(sender and receiver) and force users to use 2 SMPP connection groups
(one for receiver and one for sender).

yep, I can confirm this as seen on some production (high load) systems.

The effect can be described as following:

- A SMPP connection that uses 'port' and 'receive-port' in the smsc group.
- Which means we have 1 TCP connection for the transmitter session, and 1 TCP
connection for the receiver session.
- Due to architecture design, a "smsc" can have only 1 state. Normally this
would be "online" internally, so the abstraction layer of bearerbox will
consider in it's routing decision this as a valid route.
- Now, the magic part: We get a so called "silent TCP teardrop", which means the TCP connections are "semantically disconnected" by a middle router, but the TCP end-points (server, client) don't get a corresponding TCP drop packet. In this state we (Kannel) "believes" we're still connected. If we use enquire_link PDUs
to ensure we "are" connected, we will notice we're not and drop the TCP
connections, trying to re-establish the connection.
- The "fun" thing is: if the TCP teardrop happens only for the transmitter
session, the receiver session will still keep getting enquire_link_resp PDUs back from the SMSC side and keep the overall module state in "online", and hence
the transmitter part is NOT re-establishing the connection.
- The result: we end up in a transmitter session that seems "online" from the perspective of the abstractive bearerbox layer, so it WILL route MTs this way,
and we keep pushing them into the "lost sink".

What we need to check is the state handling here.

@Arne: is this pretty much the behavior you have also being faced with?

Stipe

--
-------------------------------------------------------------------
KΓ¶lner Landstrasse 419
40589 DΓΌsseldorf, NRW, Germany

tolj.org system architecture      Kannel Software Foundation (KSF)
http://www.tolj.org/              http://www.kannel.org/

mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------



Reply via email to