Daniil Kirilyuk created QPID-8526:
-------------------------------------
Summary: Connection looping in
NonBlockingConnectionTLSDelegate.doWrite()
Key: QPID-8526
URL: https://issues.apache.org/jira/browse/QPID-8526
Project: Qpid
Issue Type: Bug
Components: Broker-J
Affects Versions: qpid-java-broker-8.0.4
Reporter: Daniil Kirilyuk
Issue is similar to QPID-8489: on certain conditions SSL connection thread
falls into an infinite loop consuming CPU.
Stacktrace from the threaddump:
{noformat}
"IO-/127.0.0.1:50414" #32 prio=5 os_prio=0 cpu=43504522.88ms
elapsed=330519.20s tid=0x00007f47a0375000 nid=0x9089 runnable
[0x00007f47956fe000]
java.lang.Thread.State: RUNNABLE
at
sun.security.ssl.TransportContext.getHandshakeStatus([email protected]/TransportContext.java:571)
at
sun.security.ssl.SSLEngineImpl.getHandshakeStatus([email protected]/SSLEngineImpl.java:801)
- locked <0x00007f4b721e7340> (a sun.security.ssl.SSLEngineImpl)
at
org.apache.qpid.server.transport.NonBlockingConnectionTLSDelegate.wrapBufferArray(NonBlockingConnectionTLSDelegate.java:239)
at
org.apache.qpid.server.transport.NonBlockingConnectionTLSDelegate.doWrite(NonBlockingConnectionTLSDelegate.java:164)
at
org.apache.qpid.server.transport.NonBlockingConnection.doWrite(NonBlockingConnection.java:521)
at
org.apache.qpid.server.transport.NonBlockingConnection.shutdownFinalWrite(NonBlockingConnection.java:422)
at
org.apache.qpid.server.transport.NonBlockingConnection.shutdown(NonBlockingConnection.java:384)
at
org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:313)
at
org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
at
org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
at
org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
at
org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
at
org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
at
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
at
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
at
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$64/0x00007f47d52bcc40.run(Unknown
Source)
at java.lang.Thread.run([email protected]/Thread.java:834)
{noformat}
Stacktrace 2 from the threaddump:
{noformat}
"IO-/127.0.0.1:50414" #32 prio=5 os_prio=0 cpu=43536330.84ms
elapsed=330555.26s tid=0x00007f47a0375000 nid=0x9089 runnable
[0x00007f47956fe000]
java.lang.Thread.State: RUNNABLE
at
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.encryptSSL(QpidByteBufferFactory.java:178)
at
org.apache.qpid.server.bytebuffer.QpidByteBuffer.encryptSSL(QpidByteBuffer.java:62)
at
org.apache.qpid.server.transport.NonBlockingConnectionTLSDelegate.wrapBufferArray(NonBlockingConnectionTLSDelegate.java:255)
at
org.apache.qpid.server.transport.NonBlockingConnectionTLSDelegate.doWrite(NonBlockingConnectionTLSDelegate.java:164)
at
org.apache.qpid.server.transport.NonBlockingConnection.doWrite(NonBlockingConnection.java:521)
at
org.apache.qpid.server.transport.NonBlockingConnection.shutdownFinalWrite(NonBlockingConnection.java:422)
at
org.apache.qpid.server.transport.NonBlockingConnection.shutdown(NonBlockingConnection.java:384)
at
org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:313)
at
org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
at
org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
at
org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
at
org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
at
org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
at
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
at
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
at
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$64/0x00007f47d52bcc40.run(Unknown
Source)
at java.lang.Thread.run([email protected]/Thread.java:834)
{noformat}
Stucked connection details (retrieved via broker REST API):
{noformat}
{
"id" : "60b493c2-de91-4690-8e4c-1e3d5d5ea353",
"name" : "[8332442] 127.0.0.1:64711",
"type" : "AMQP_1_0",
"desiredState" : "ACTIVE",
"state" : "ACTIVE",
"durable" : false,
"lifetimePolicy" : "PERMANENT",
"incoming" : true,
"incomingIdleTimeout" : 60000,
"lastOpenedTime" : 1618942801846,
"maxUncommittedInMemorySize" : 10485760,
"outgoingIdleTimeout" : 0,
"port" : "amqps",
"protocol" : "AMQP_1_0",
"remoteAddress" : "/127.0.0.1:64711",
"sendSaslFinalChallengeAsChallenge" : false,
"sessionCountLimit" : 1,
"transport" : "SSL",
"transportInfo" : "TLSv1.2 ; TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256",
"lastUpdatedTime" : 1618942801846,
"createdTime" : 1618942801846,
"statistics" : {
"bytesIn" : 0,
"bytesOut" : 0,
"lastInboundMessageTime" : 1618942801846,
"lastIoTime" : 1618942801846,
"lastMessageTime" : 1618942801846,
"lastOutboundMessageTime" : 1618942801846,
"localTransactionBegins" : 0,
"localTransactionOpen" : 0,
"localTransactionRollbacks" : 0,
"messagesIn" : 0,
"messagesOut" : 0,
"sessionCount" : 0,
"transactedMessagesIn" : 0,
"transactedMessagesOut" : 0
}
}
{noformat}
It seems that broker tries to close SSL connection and loops inside
NonBlockingConnection.shutdownFinalWrite() continuously calling
NonBlockingConnection.doWrite() -> NonBlockingConnectionTLSDelegate.doWrite()
-> NonBlockingConnectionTLSDelegate.wrapBufferArray().
After adding circuit breaker and explicit logging in methods
NonBlockingConnection.doWrite(), NonBlockingConnectionTLSDelegate.doWrite() and
NonBlockingConnectionTLSDelegate.wrapBufferArray() following logs were
retrieved:
{noformat}
2021-05-10 06:30:54,637 INFO [Broker-Config] (q.m.c.open) -
[con:9,435,937(/127.0.0.1:52648)] CON-1001 : Open : Destination :
amqps(90.163.253.5:10170) : Protocol Version : 1.0 : SSL
2021-05-10 06:30:54,688 INFO [IO-/127.0.0.1:52648] (q.m.c.open) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)] CON-1001 : Open
: Destination : amqps(90.163.253.5:10170) : Protocol Version : 1.0 : SSL :
Client ID : Exxeta-299e32f1-1d0a-4db6-aeb8-1822411b7fc6 : Client Version :
0.50.0 : Client Product : QpidJMS
2021-05-10 06:30:54,689 INFO [IO-/127.0.0.1:52648] (q.m.a.allowed) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)] ACL-1001 :
Allowed : Access Virtualhost NAME=default,VIRTUALHOST_NAME=default
2021-05-10 06:30:54,709 INFO [IO-/127.0.0.1:52648] (q.m.c.create) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:0] CHN-1001 :
Create
2021-05-10 06:30:54,739 INFO [IO-/127.0.0.1:52648] (q.m.c.create) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] CHN-1001 :
Create
2021-05-10 06:30:54,771 INFO [VirtualHostNode-default-Config] (q.m.a.allowed)
- [con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] ACL-1001
: Allowed : Consume Queue
NAME=broadcast.VTVTV_YYYYXXXXXZZZSSS.TradeConfirmation,DURABLE=true,EXCLUSIVE=false,TEMPORARY=false,AUTO_DELETE=false,VIRTUALHOST_NAME=default,CREATED_BY=admin
2021-05-10 06:30:54,771 INFO [VirtualHostNode-default-Config] (q.m.s.create) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,978(vh(/default)/qu(broadcast.VTVTV_YYYYXXXXXZZZSSS.TradeConfirmation)]
SUB-1001 : Create
2021-05-10 06:30:54,799 INFO [VirtualHostNode-default-Config] (q.m.a.allowed)
- [con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] ACL-1001
: Allowed : Consume Queue
NAME=broadcast.VTVTV_YYYYXXXXXZZZSSS.Workflow,DURABLE=true,EXCLUSIVE=false,TEMPORARY=false,AUTO_DELETE=false,VIRTUALHOST_NAME=default,CREATED_BY=admin
2021-05-10 06:30:54,799 INFO [VirtualHostNode-default-Config] (q.m.s.create) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,979(vh(/default)/qu(broadcast.VTVTV_YYYYXXXXXZZZSSS.Workflow)] SUB-1001
: Create
2021-05-10 06:30:54,816 INFO [VirtualHostNode-default-Config] (q.m.a.allowed)
- [con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] ACL-1001
: Allowed : Consume Queue
NAME=broadcast.VTVTV_YYYYXXXXXZZZSSS.Public,DURABLE=true,EXCLUSIVE=false,TEMPORARY=false,AUTO_DELETE=false,VIRTUALHOST_NAME=default,CREATED_BY=admin
2021-05-10 06:30:54,816 INFO [VirtualHostNode-default-Config] (q.m.s.create) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,980(vh(/default)/qu(broadcast.VTVTV_YYYYXXXXXZZZSSS.Public)] SUB-1001 :
Create
2021-05-10 06:30:54,834 INFO [VirtualHostNode-default-Config] (q.m.a.allowed)
- [con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] ACL-1001
: Allowed : Consume Queue
NAME=response.VTVTV_YYYYXXXXXZZZSSS,DURABLE=true,EXCLUSIVE=false,TEMPORARY=false,AUTO_DELETE=false,VIRTUALHOST_NAME=default,CREATED_BY=admin
2021-05-10 06:30:54,834 INFO [VirtualHostNode-default-Config] (q.m.s.create) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,981(vh(/default)/qu(response.VTVTV_YYYYXXXXXZZZSSS)] SUB-1001 : Create
2021-05-10 06:30:54,856 INFO [IO-/127.0.0.1:52648] (q.m.s.create) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] SND-1001 :
Create :
qpid-jms:sender:ID:e72a83ab-2332-4e64-b891-0bf26cb8f632:1:1:1:request.VTVTV_YYYYXXXXXZZZSSS
: request.VTVTV_YYYYXXXXXZZZSSS
2021-05-10 10:01:40,576 INFO [IO-/127.0.0.1:52648] (q.m.c.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:0]
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:0] CHN-1003 :
Close
2021-05-10 10:01:40,576 INFO [VirtualHostNode-default-Config] (q.m.s.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,978(vh(/default)/qu(broadcast.VTVTV_YYYYXXXXXZZZSSS.TradeConfirmation)]
SUB-1002 : Close
2021-05-10 10:01:40,576 INFO [IO-/127.0.0.1:52648] (q.m.s.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] SND-1002 :
Close :
qpid-jms:sender:ID:e72a83ab-2332-4e64-b891-0bf26cb8f632:1:1:1:request.VTVTV_YYYYXXXXXZZZSSS
: request.VTVTV_YYYYXXXXXZZZSSS
2021-05-10 10:01:40,577 INFO [VirtualHostNode-default-Config] (q.m.s.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,980(vh(/default)/qu(broadcast.VTVTV_YYYYXXXXXZZZSSS.Public)] SUB-1002 :
Close
2021-05-10 10:01:40,578 INFO [VirtualHostNode-default-Config] (q.m.s.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,981(vh(/default)/qu(response.VTVTV_YYYYXXXXXZZZSSS)] SUB-1002 : Close
2021-05-10 10:01:40,578 INFO [VirtualHostNode-default-Config] (q.m.s.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[sub:36,979(vh(/default)/qu(broadcast.VTVTV_YYYYXXXXXZZZSSS.Workflow)] SUB-1002
: Close
2021-05-10 10:01:40,579 INFO [IO-/127.0.0.1:52648] (q.m.c.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1]
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)/ch:1] CHN-1003 :
Close
2021-05-10 10:01:40,579 INFO [IO-/127.0.0.1:52648]
(o.a.q.s.p.v.AMQPConnection_1_0Impl) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@/127.0.0.1:52648/default)] : Connection
closed with error : amqp:resource-limit-exceeded - local-idle-timeout expired
2021-05-10 10:01:40,644 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - bufCount: 2
2021-05-10 10:01:40,644 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - _status: [ Status = CLOSED,
HandshakeStatus = NOT_HANDSHAKING, bytesConsumed = 0, bytesProduced = 0 ],
_sslEngine.isOutboundDone(): true, _sslEngine.isInboundDone(): true,
_sslEngine.getPeerHost(): null, _sslEngine.getPeerPort(): -1, buffers.size():
2, _encryptedOutput.size(): 0, _netOutputBuffer.remaining(): 152185,
_sslEngine.getSession().getPacketBufferSize(): 16709,
_sslEngine.getSession().getCreationTime(): 1620621054549,
_sslEngine.getSession().getLastAccessedTime(): 1620621054549,
_sslEngine.getHandshakeSession().isValid(): null handshake session
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - buffers.size(): 2, bufsSent:
false, bytesWritten: -1, _encryptedOutput: 0
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnection) - Looping detected: false, 0, 2
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - bufCount: 2
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - _status: [ Status = CLOSED,
HandshakeStatus = NOT_HANDSHAKING, bytesConsumed = 0, bytesProduced = 0 ],
_sslEngine.isOutboundDone(): true, _sslEngine.isInboundDone(): true,
_sslEngine.getPeerHost(): null, _sslEngine.getPeerPort(): -1, buffers.size():
2, _encryptedOutput.size(): 0, _netOutputBuffer.remaining(): 152185,
_sslEngine.getSession().getPacketBufferSize(): 16709,
_sslEngine.getSession().getCreationTime(): 1620621054549,
_sslEngine.getSession().getLastAccessedTime(): 1620621054549,
_sslEngine.getHandshakeSession().isValid(): null handshake session
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - buffers.size(): 2, bufsSent:
false, bytesWritten: -1, _encryptedOutput: 0
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnection) - Looping detected: false, 0, 2
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - bufCount: 2
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - _status: [ Status = CLOSED,
HandshakeStatus = NOT_HANDSHAKING, bytesConsumed = 0, bytesProduced = 0 ],
_sslEngine.isOutboundDone(): true, _sslEngine.isInboundDone(): true,
_sslEngine.getPeerHost(): null, _sslEngine.getPeerPort(): -1, buffers.size():
2, _encryptedOutput.size(): 0, _netOutputBuffer.remaining(): 152185,
_sslEngine.getSession().getPacketBufferSize(): 16709,
_sslEngine.getSession().getCreationTime(): 1620621054549,
_sslEngine.getSession().getLastAccessedTime(): 1620621054549,
_sslEngine.getHandshakeSession().isValid(): null handshake session
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - buffers.size(): 2, bufsSent:
false, bytesWritten: -1, _encryptedOutput: 0
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnection) - Looping detected: false, 0, 2
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - bufCount: 2
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - _status: [ Status = CLOSED,
HandshakeStatus = NOT_HANDSHAKING, bytesConsumed = 0, bytesProduced = 0 ],
_sslEngine.isOutboundDone(): true, _sslEngine.isInboundDone(): true,
_sslEngine.getPeerHost(): null, _sslEngine.getPeerPort(): -1, buffers.size():
2, _encryptedOutput.size(): 0, _netOutputBuffer.remaining(): 152185,
_sslEngine.getSession().getPacketBufferSize(): 16709,
_sslEngine.getSession().getCreationTime(): 1620621054549,
_sslEngine.getSession().getLastAccessedTime(): 1620621054549,
_sslEngine.getHandshakeSession().isValid(): null handshake session
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - buffers.size(): 2, bufsSent:
false, bytesWritten: -1, _encryptedOutput: 0
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnection) - Looping detected: false, 0, 2
2021-05-10 10:01:40,645 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - bufCount: 2
2021-05-10 10:01:40,646 WARN [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnectionTLSDelegate) - _status: [ Status = CLOSED,
HandshakeStatus = NOT_HANDSHAKING, bytesConsumed = 0, bytesProduced = 0 ],
_sslEngine.isOutboundDone(): true, _sslEngine.isInboundDone(): true,
_sslEngine.getPeerHost(): null, _sslEngine.getPeerPort(): -1, buffers.size():
2, _encryptedOutput.size(): 0, _netOutputBuffer.remaining(): 152185,
_sslEngine.getSession().getPacketBufferSize(): 16709,
_sslEngine.getSession().getCreationTime(): 1620621054549,
_sslEngine.getSession().getLastAccessedTime(): 1620621054549,
_sslEngine.getHandshakeSession().isValid(): null handshake session
2021-05-10 10:01:40,646 INFO [IO-/127.0.0.1:52648]
(o.a.q.s.t.NonBlockingConnection) - Exception performing final write/close for
'/127.0.0.1:52648': Looping detected, executing circuit breaker
2021-05-10 10:01:40,647 INFO [Broker-Config] (q.m.c.close) -
[con:9,435,937(VTVTV_YYYYXXXXXZZZSSS@null/default)] CON-1002 : Close
{noformat}
Noticeable is SSLEngineResult.Status.CLOSED as well as
SSLEngineResult.HandshakeStatus NOT_HANDSHAKING. Combination of those statuses
indicates that the last operation closed broker side of the SSLEngine, or the
operation could not be completed because it was already closed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]