Status: Accepted
Owner: brainslog
CC: [email protected]
Labels: Type-Defect Priority-High Component-DIAMETER-Stack Usability Roadmap-Fix DIAMETER-1.5.0.FINAL

New issue 18 by brainslog: Diameter Stack: Concurrency Problem by handling CER
http://code.google.com/p/jdiameter/issues/detail?id=18

Reported by angel.t.angelov, Jul 11, 2012:

Hi, I used Diameter stack on slow solaris server without problem.
I migrated to more powerful Linux server and I got problems with the CER-CEA.
That mean, we have obviously concurrency problem.

To find the problem I put some new debug lines into org.jdiameter.client.impl.fsm.PeerFSMImpl (see the file attached)

See attached the full server log (server.zip) and the extract of this log regarding the event handling and executor thread (trace.txt)

Let take one event from trace.txt
For example: Event{name:CER_EVENT, key:aaa://172.23.86.7:50588, object:MessageImpl{commandCode=257, flags=128}}


2012-07-11 14:00:39,605 DEBUG [NetworkGuard-0] [PeerFSMImpl] AA: Put Event: Event{name:CER_EVENT, key:aaa://172.23.86.7:50588, object:MessageImpl{commandCode=257, flags=128}} 2012-07-11 14:00:39,605 DEBUG [NetworkGuard-0] [PeerFSMImpl] AA: Put Event: Event{name:CER_EVENT, key:aaa://172.23.86.7:50588, object:MessageImpl{commandCode=257, flags=128}}

The first question: why this Event is put twice into the queue!?

After this 2 Events are put to the queue, other Event is extracted and the thread executor is stopped: 2012-07-11 14:00:39,605 DEBUG [FSM-SPeer{Uri=aaa://gbg41.um.internal:50578; State=DOWN; con=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@9aa402; incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110, aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528, aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6} }-14] [PeerFSMImpl] AA: Got Event: Event{name:CER_EVENT, key:aaa://172.23.86.7:50583, object:MessageImpl{commandCode=257, flags=128}} 2012-07-11 14:00:39,706 DEBUG [FSM-SPeer{Uri=aaa://gbg41.um.internal:50578; State=DOWN; con=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@9aa402; incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110, aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528, aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6} }-14] [PeerFSMImpl] AA: Executor finished

After 2 seconds new executor is created. That mean 2 seconds the initial Event is not handled

2012-07-11 14:00:41,602 DEBUG [TCPReader-13] [PeerFSMImpl] AA: Executor created: FSM-SPeer{Uri=aaa://gbg41.um.internal:50578; State=DOWN; con=null; incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110, aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528, aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6} }-15

And than the event is got for handling:
2012-07-11 14:00:41,604 DEBUG [FSM-SPeer{Uri=aaa://gbg41.um.internal:50578; State=DOWN; con=null; incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110, aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528, aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6} }-15] [PeerFSMImpl] AA: Got Event: Event{name:CER_EVENT, key:aaa://172.23.86.7:50588, object:MessageImpl{commandCode=257, flags=128}}

But after 2 seconds the client has dropped the connection (timed out) and the sending of CEA fails because the connection is closed by the client.

And this repeats to infinite.

This happens with every Event you can see in the logs.
And this behaviour is reproducible any time on this server.

I found (after 3 days of debugging :-( ) a simple work around for this:
I just disabled the executor thread destroying and recreating.
I disabled this by commenting the following line:
executor = null;
in
org.jdiameter.client.impl.fsm.PeerFSMImpl
and
org.jdiameter.server.impl.fsm.PeerFSMImpl

Actually, this is very strange way to stop an executor thread.

Please, some developer of jdiameter analyse this problem and give me feedback. Is my work around OK? Will be provided some general fix of the issue?

I got a lot of experience last time with jdiameter and I'm ready to help if you like it.


P.S. The source code of the jdiameter is written in very strange way and it is very difficult to debug and find problems like this. I think this code should be optimized.


Regards,
Angel

Attachments:
        PeerFSMImpl.java  25.4 KB
        server.zip  92.0 KB
        trace.txt  83.1 KB

Reply via email to