[ 
https://issues.apache.org/jira/browse/QPID-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228244#comment-15228244
 ] 

Alan Conway commented on QPID-7182:
-----------------------------------

I can't reproduce on present trunk: eb1e785 origin/trunk origin/HEAD QPID-7149: 
Fixed messaging client shutdown.

Not I am using a different debug logging change to QueueReplicator.cpp, that 
only logs when the size grows, not on every dequeue:
{code}
    static size_t max = 0;
    if (e.ids.size() > max) {
        QPID_LOG(critical, logPrefix << "FIXME e.size()=" << e.ids.size());
        max = e.ids.size();
    }
{code}

2 theories to explain the reported observations:

1. The first bug report was from a build without "006496cf Bug 1318180 - HA 
memory leak in primary broker when overwriting messages in a ring queue" OR was 
somehow a consequence of "eb1e785 origin/trunk origin/HEAD QPID-7149: Fixed 
messaging client shutdown." 

2. The warning added to QueueReplicator.cpp in 
https://issues.apache.org/jira/browse/QPID-7182?focusedCommentId=15223921&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15223921
 showing growth of the deque events. I suspect the extra logging (on backup 
only for every dequeue event) slowed the backups down with respect to the 
primary. The primary began to accumulate large numbers of un-sent dequeues 
because of the slow down, which in turn meant the backups were getting huge 
dequeue records to process, which slowed them down even more and caused them to 
burn CPU.

> [C++ broker] high CPU usage on backup brokers following QPID-7149 scenario
> --------------------------------------------------------------------------
>
>                 Key: QPID-7182
>                 URL: https://issues.apache.org/jira/browse/QPID-7182
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: qpid-cpp-next
>            Reporter: Pavel Moravec
>            Assignee: Alan Conway
>
> Following scenario from QPID-7149 with --ha-replicate=all, with whatever 
> patch fixing it applied or not, CPU usage of backup brokers grow over the 
> time.
> gdb shows one active thread always with backtrace:
> {noformat}
> #0  0x00007f9295fc9c98 in find (this=0x7f9270055840, data=<value optimized 
> out>)
>     at 
> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/hashtable:786
> #1  qpid::ha::QueueReplicator::dequeueEvent (this=0x7f9270055840, data=<value 
> optimized out>)
>     at /data_xfs/qpid/cpp/src/qpid/ha/QueueReplicator.cpp:306
> #2  0x00007f9295fca82b in operator() (this=0x7f9270055840, deliverable=<value 
> optimized out>)
>     at /usr/include/boost/function/function_template.hpp:1013
> #3  qpid::ha::QueueReplicator::route (this=0x7f9270055840, deliverable=<value 
> optimized out>)
>     at /data_xfs/qpid/cpp/src/qpid/ha/QueueReplicator.cpp:329
> #4  0x00007f9296b9b854 in qpid::broker::SemanticState::route 
> (this=0x7f927001d088, msg=..., strategy=...)
>     at /data_xfs/qpid/cpp/src/qpid/broker/SemanticState.cpp:506
> #5  0x00007f9296bb8ab7 in qpid::broker::SessionState::handleContent 
> (this=0x7f927001cec0, frame=<value optimized out>)
>     at /data_xfs/qpid/cpp/src/qpid/broker/SessionState.cpp:233
> #6  0x00007f9296bb90a1 in qpid::broker::SessionState::handleIn 
> (this=0x7f927001cec0, frame=...)
>     at /data_xfs/qpid/cpp/src/qpid/broker/SessionState.cpp:293
> #7  0x00007f92965d4c31 in qpid::amqp_0_10::SessionHandler::handleIn 
> (this=0x7f927002fbb0, f=...)
>     at /data_xfs/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:93
> #8  0x00007f9296b29a2b in operator() (this=0x7f9270002060, frame=...) at 
> /data_xfs/qpid/cpp/src/qpid/framing/Handler.h:39
> #9  qpid::broker::ConnectionHandler::handle (this=0x7f9270002060, frame=...)
>     at /data_xfs/qpid/cpp/src/qpid/broker/ConnectionHandler.cpp:93
> #10 0x00007f9296b247e8 in qpid::broker::amqp_0_10::Connection::received 
> (this=0x7f9270001e80, frame=...)
>     at /data_xfs/qpid/cpp/src/qpid/broker/amqp_0_10/Connection.cpp:198
> #11 0x00007f9296ab2863 in qpid::amqp_0_10::Connection::decode 
> (this=0x7f92700018b0, buffer=<value optimized out>, 
>     size=<value optimized out>) at 
> /data_xfs/qpid/cpp/src/qpid/amqp_0_10/Connection.cpp:59
> #12 0x00007f92965fdca0 in qpid::sys::AsynchIOHandler::readbuff 
> (this=0x7f92700009b0, buff=0x7f9270001880)
>     at /data_xfs/qpid/cpp/src/qpid/sys/AsynchIOHandler.cpp:138
> #13 0x00007f929657be89 in operator() (this=0x7f9270000a50, h=...) at 
> /usr/include/boost/function/function_template.hpp:1013
> #14 qpid::sys::posix::AsynchIO::readable (this=0x7f9270000a50, h=...) at 
> /data_xfs/qpid/cpp/src/qpid/sys/posix/AsynchIO.cpp:453
> #15 0x00007f92966025b3 in boost::function1<void, 
> qpid::sys::DispatchHandle&>::operator() (this=<value optimized out>, 
>     a0=<value optimized out>) at 
> /usr/include/boost/function/function_template.hpp:1013
> #16 0x00007f9296601246 in qpid::sys::DispatchHandle::processEvent 
> (this=0x7f9270000a58, type=qpid::sys::Poller::READABLE)
>     at /data_xfs/qpid/cpp/src/qpid/sys/DispatchHandle.cpp:280
> #17 0x00007f92965a1d1d in process (this=0x7961c0) at 
> /data_xfs/qpid/cpp/src/qpid/sys/Poller.h:131
> ..
> {noformat}
> or with:
> {noformat}
> #0  0x00000032c4c0a7b0 in pthread_mutex_unlock () from /lib64/libpthread.so.0
> #1  0x00007fb0958038fa in qpid::sys::Mutex::unlock (this=<value optimized 
> out>) at /data_ext4/qpid/cpp/src/qpid/sys/posix/Mutex.h:120
> #2  0x00007fb095840628 in ~ScopedLock (this=0x112cfd0, data=<value optimized 
> out>) at /data_ext4/qpid/cpp/src/qpid/sys/Mutex.h:34
> #3  qpid::ha::QueueReplicator::dequeueEvent (this=0x112cfd0, data=<value 
> optimized out>)
>     at /data_ext4/qpid/cpp/src/qpid/ha/QueueReplicator.cpp:308
> ..
> {noformat}
> Not sure where the busy loop origins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to