[jira] [Updated] (QPID-8489) Connection thread looping

Daniil Kirilyuk (Jira) Thu, 10 Dec 2020 05:54:07 -0800


     [ 
https://issues.apache.org/jira/browse/QPID-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daniil Kirilyuk updated QPID-8489:
----------------------------------
    Attachment: QPID-8489 - java.ssl.debug.log

> Connection thread looping
> -------------------------
>
>                 Key: QPID-8489
>                 URL: https://issues.apache.org/jira/browse/QPID-8489
>             Project: Qpid
>          Issue Type: Bug
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-8.0.2
>            Reporter: Daniil Kirilyuk
>            Priority: Minor
>         Attachments: QPID-8489 - java.ssl.debug.log, QPID-8489.log, 
> simple_recv.cpp, thread-dump.st
>
>
> Error happens quite rarely and is not easy to reproduce. Although the problem 
> was partly fixed by fixing QPID-8477, it still can be reproduced. The main 
> symptom is significant increase of CPU usage even when no messages are sent 
> to broker anymore. CPU usage can rise from 30% to 90% and higher, making 
> broker unusable. After such CPU rise the only way to fix broker will be 
> restarting it.
> Analysis has shown, that error occurs with CPP proton client in cases when
> 1) SSL connection is used
> 2) connection errors on client side are ignored
> 3) connection is dropped due to the client process termination / network 
> disconnection
> *Steps to reproduce*
>  # Java broker should be installed
>  # Broker should be configured to allow one connection
>  # Prepare certificates
>  # Install Qpid::Proton 0.28.0
> wget 
> [http://archive.apache.org/dist/qpid/proton/0.28.0/qpid-proton-0.28.0.tar.gz]
> gunzip qpid-proton-0.28.0.tar.gz
> mkdir -p qpid-proton-0.28.0/build && pushd qpid-proton-0.28.0/build && cmake 
> .. && make all && popd
>  # Replace and edit example *qpid-proton-0.28.0/cpp/examples/simple_recv.cpp* 
> with the one attached
>  # Build again
> cd qpid-proton-0.28.0/build
> make
>  # Break the broker
> ./cpp/examples/simple_recv & ./cpp/examples/simple_recv
> Connection error
> {color:#ffab00}^C <= Hit Ctrl+C to kill process{color}
>  # {color:#172b4d}If CPU usage didn't increased, find the PID of the first 
> simple_recv process using ps-ef | grep simple_recv and kill it using kill -9 
> PID.{color}
> *Analysis*
> CPU usage rises when connection is dropped on the client side or when network 
> is broken between client and broker. The main point is that client isn't well 
> behaved and connection shouldn't be closed correctly.
> On broker side connection becomes "orphaned": it is still maintained by 
> broker, but no real reading / writing is performed. Following method calls 
> are performed in an endless loop for each "orphaned" connection:
> SelectorThread.performSelect() 
> SelectorThread.ConnectionProcessor.processConnection()
> NetworkConnectionScheduler.processConnection()
> NonBlockingConnection.doWork()
> As there nothing physically read or written, both methods 
> NonBlockingConnection.doRead() and NonBlockingConnection.doWrite() execute 
> very fast (several milliseconds) without any blocking processes and after 
> that connection is immediately rescheduled for processing in 
> NetworkConnectionScheduler. After that loop repeats.
> As the connection lifecycle is normal, there is logged nothing unusual or 
> suspicious (nothing is seen in log at all).
> In thread dump (see attachment) there is seen, that utilized are mostly 
> thread with names virtualhost-default-iopool-XX. Typical stacktrace looks 
> like following:
> {noformat}
> "virtualhost-default-iopool-39" #92 prio=5 os_prio=0 tid=0x00007f47ec335800 
> nid=0x37196 waiting on condition 
> [0x00007f476a4e3000]"virtualhost-default-iopool-39" #92 prio=5 os_prio=0 
> tid=0x00007f47ec335800 nid=0x37196 waiting on condition [0x00007f476a4e3000]  
>  java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
> Method) - parking to wait for  <0x00000000f39105d0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
> at 
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:532) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
>  at 
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$18/422330142.run(Unknown
>  Source) at java.lang.Thread.run(Thread.java:748)}
> {noformat}
> The main symptom of an error is rising CPU usage, which can reach up to 90% 
> in case, when several connections are "orphaned". Additional factor leading 
> to the problem is disabled keep-alive option for a connection or long 
> keep-alive interval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (QPID-8489) Connection thread looping

Reply via email to