Alex Rudyy created QPID-8276:
--------------------------------
Summary: [Broker-J] Broker can leak closed NonBlockingConnection
objects and eventually run out of heap memory
Key: QPID-8276
URL: https://issues.apache.org/jira/browse/QPID-8276
Project: Qpid
Issue Type: Improvement
Components: Broker-J
Affects Versions: qpid-java-broker-7.0.6, qpid-java-broker-7.0.5,
qpid-java-broker-7.0.4, qpid-java-broker-7.1.0, qpid-java-6.1.7,
qpid-java-broker-7.0.1, qpid-java-broker-7.0.0, qpid-java-broker-7.0.2,
qpid-java-broker-7.0.3
Reporter: Alex Rudyy
Fix For: qpid-java-broker-7.0.7, qpid-java-broker-7.1.1
The Qpid Broker-J can leak closed NonBlockingConnection objects.
The heap dump analysis of impacted broker instance revealed that leaked
{{NonBlockingConnection}} objects are accumulated in
{{SelectorThread.SelectionTask#_unscheduledConnections}} belonging to AMQP port
IO pool. They have no ticker set and no state changed flag set
({{NonBlockingConnection#isStateChanged() == false)}}. As result, the
NonBlockingConnection objects are not removed from
{{SelectorThread#_unscheduledConnections}} on invocation of
{{SelectorThread.SelectionTask#processUnscheduledConnections()}} called from
{{SelectorThread.SelectionTask#performSelect()}}.
The {{NonBlockingConnection}} and underlying model object are in closed state.
It seems that leaked {{NonBlockingConnection}} was closed as part of
invocation {{NonBlockingConnection#doWork()}}. The connection was unregistered
on {{VirtualHost}} IO pool and re-registered with port IO pool as part of
invocation {{NetworkConnectionScheduler#processConnection}} At first, it was
stored in collection {{SelectorThread.SelectionTask#_unregisteredConnections}}.
Later on, it was moved from
{{SelectorThread.SelectionTask#_unregisteredConnections}} to
{{SelectorThread.SelectionTask#_unscheduledConnections}} as part of invocation
{{SelectorThread.SelectionTask#reregisterUnregisteredConnections}} and stack
there afterwards.
The TLS transport was used in leaked connection, but, I think that connection
with plain transport can be leaked as well.
I suspect that connections were leaked in result of following scenario:
* Invocation of {{SocketChannel#read(java.nio.ByteBuffer[])}} returned {{-1}}
in {{NonBlockingConnection#readFromNetwork}}.
* The flag {{NonBlockingConnection#_closed}} was set to {{true}}. The method
{{ProtocolEngine#notifyWork()}} was not invoked to set {{state changed}} flag
to {{true}}
* The execution of {{NonBlockingConnection#doWork()}} ended up it connection
shutdown (due to {{NonBlockingConnection#_closed}} being set) and following
re-scheduling the connection on port IO scheduler. The latter resulted in
connection being put into
{{SelectorThread.SelectionTask#_unscheduledConnections}} as described above.
It seems that opening and closing frequent connections with connection life
span {{>10s}} (required for tickers to be removed) can ended-up in connections
being leaked as described in scenario above. It looks like connections which
are closed orderly or closed in result of {{IOException}} being thrown from
socket read/write operation are not effected by the defect.
The impacted broker instance can eventually crash with out of memory error.
Broker memory monitoring and periodic broker restarts can mitigate the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]