Jason Dillaman created QPID-4286:
------------------------------------

             Summary: QMF queries for HA replication take too long to process
                 Key: QPID-4286
                 URL: https://issues.apache.org/jira/browse/QPID-4286
             Project: Qpid
          Issue Type: Bug
          Components: C++ Broker
    Affects Versions: 0.18
            Reporter: Jason Dillaman


In an HA broker with approximately 12,000 queues, it takes roughly 10-14 
seconds for the the first QMF response fragment to arrive.  While the QMF 
management agent is collecting the response, all other QMF-related 
functionality is blocked  -- which will block any thread that raises a QMF 
event.  

Not only will this result in clients getting disconnected from the broker due 
to worker threads being blocked by QMF (either due to missed heartbeats in an 
extreme case or from the 2 second handshake timeout), this also results in the 
HA backup's federated link getting disconnected due to missed heartbeats when 
the link heartbeat interval is set to a low value.  

If the HA backup loses its connection, it only exacerbates the issue since it 
will reconnect and re-query the QMF data that made it lose its connection in 
the first place.  

Recommend that QMF events not be blocked by a global management agent lock and 
also recommend that potentially long-running QMF queries be separated from the 
worker thread that initiated them to prevent a heartbeat timeout.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to