Status: Accepted
Owner: brainslog
CC: [email protected]
Labels: Type-Defect Priority-High Component-DIAMETER-Stack Usability
Roadmap-Fix DIAMETER-1.5.0.FINAL
New issue 18 by brainslog: Diameter Stack: Concurrency Problem by handling
CER
http://code.google.com/p/jdiameter/issues/detail?id=18
Reported by angel.t.angelov, Jul 11, 2012:
Hi, I used Diameter stack on slow solaris server without problem.
I migrated to more powerful Linux server and I got problems with the
CER-CEA.
That mean, we have obviously concurrency problem.
To find the problem I put some new debug lines into
org.jdiameter.client.impl.fsm.PeerFSMImpl (see the file attached)
See attached the full server log (server.zip) and the extract of this log
regarding the event handling and executor thread (trace.txt)
Let take one event from trace.txt
For example: Event{name:CER_EVENT, key:aaa://172.23.86.7:50588,
object:MessageImpl{commandCode=257, flags=128}}
2012-07-11 14:00:39,605 DEBUG [NetworkGuard-0] [PeerFSMImpl] AA: Put Event:
Event{name:CER_EVENT, key:aaa://172.23.86.7:50588,
object:MessageImpl{commandCode=257, flags=128}}
2012-07-11 14:00:39,605 DEBUG [NetworkGuard-0] [PeerFSMImpl] AA: Put Event:
Event{name:CER_EVENT, key:aaa://172.23.86.7:50588,
object:MessageImpl{commandCode=257, flags=128}}
The first question: why this Event is put twice into the queue!?
After this 2 Events are put to the queue, other Event is extracted and the
thread executor is stopped:
2012-07-11 14:00:39,605 DEBUG [FSM-SPeer{Uri=aaa://gbg41.um.internal:50578;
State=DOWN;
con=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@9aa402;
incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110,
aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528,
aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6}
}-14] [PeerFSMImpl] AA: Got Event: Event{name:CER_EVENT,
key:aaa://172.23.86.7:50583, object:MessageImpl{commandCode=257, flags=128}}
2012-07-11 14:00:39,706 DEBUG [FSM-SPeer{Uri=aaa://gbg41.um.internal:50578;
State=DOWN;
con=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@9aa402;
incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110,
aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528,
aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6}
}-14] [PeerFSMImpl] AA: Executor finished
After 2 seconds new executor is created. That mean 2 seconds the initial
Event is not handled
2012-07-11 14:00:41,602 DEBUG [TCPReader-13] [PeerFSMImpl] AA: Executor
created: FSM-SPeer{Uri=aaa://gbg41.um.internal:50578; State=DOWN; con=null;
incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110,
aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528,
aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6}
}-15
And than the event is got for handling:
2012-07-11 14:00:41,604 DEBUG [FSM-SPeer{Uri=aaa://gbg41.um.internal:50578;
State=DOWN; con=null;
incCon{aaa://172.23.86.7:50588=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@82f110,
aaa://172.23.86.7:50578=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@110f528,
aaa://172.23.86.7:50577=org.jdiameter.client.impl.transport.tcp.TCPClientConnection@1b4a4b6}
}-15] [PeerFSMImpl] AA: Got Event: Event{name:CER_EVENT,
key:aaa://172.23.86.7:50588, object:MessageImpl{commandCode=257, flags=128}}
But after 2 seconds the client has dropped the connection (timed out) and
the sending of CEA fails because the connection is closed by the client.
And this repeats to infinite.
This happens with every Event you can see in the logs.
And this behaviour is reproducible any time on this server.
I found (after 3 days of debugging :-( ) a simple work around for this:
I just disabled the executor thread destroying and recreating.
I disabled this by commenting the following line:
executor = null;
in
org.jdiameter.client.impl.fsm.PeerFSMImpl
and
org.jdiameter.server.impl.fsm.PeerFSMImpl
Actually, this is very strange way to stop an executor thread.
Please, some developer of jdiameter analyse this problem and give me
feedback. Is my work around OK? Will be provided some general fix of the
issue?
I got a lot of experience last time with jdiameter and I'm ready to help if
you like it.
P.S. The source code of the jdiameter is written in very strange way and it
is very difficult to debug and find problems like this. I think this code
should be optimized.
Regards,
Angel
Attachments:
PeerFSMImpl.java 25.4 KB
server.zip 92.0 KB
trace.txt 83.1 KB