Status: New
Owner: ----
Labels: Component-Diameter Type-Enhancement Priority-Medium Version-1.4.0
Release-Type-FINAL Roadmap-Fix
New issue 2446 by [email protected]: Unknown realm name [null]
error when sending diameter response under high concurrency
http://code.google.com/p/mobicents/issues/detail?id=2446
What steps will reproduce the problem?
1. Use a diameter client to send a large number of concurrent requests
(e.g. 10 threads and 10000 per thread) into the SLEE Diameter RA
2. After a while, an error will be logged indicating that the reponse
message could not be sent due to the realm name being null.
What is the expected output? What do you see instead?
The reaml name is known to the stack and this error should not be present
What version of the product are you using? On what operating system?
jdiameter-impl-1.5.4.1-build415.jar on Solaris
Please provide any additional information below.
An investigation shows that this is due to the logic inherent in RouterImpl
and the way it looks up a response destination host and realm using a
hopbyhopid. The id's are stored in a Map when a request comes in, and
retried when a response is sent. The issue is that the code prunes the
first half of the map when its size reaches 10240.
Under high load on a powerful server, lets say 1000 requests per second
come into the stack. After 10s, the first 5120 hopbyhopid's will be
removed. If these are removed, then responses for the messages received in
the first 5s that took longer than 5s to process would fail to be sent.
Although this is not common, its an undesirable situation to ungracefully
loose messages like that under load. I would recommend that the pruning
logic be changed to rather remove the first 10% of the current size of the
Map when it exceeds 20*1024 messages (as opposed to the current logic of
5120 when it reaches 10240). This still keeps the map from growing
infinitely and gives a larger window for high load.