[
https://issues.apache.org/jira/browse/QPID-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983493#comment-13983493
]
ASF subversion and git services commented on QPID-5719:
-------------------------------------------------------
Commit 1590777 from [~aconway] in branch 'qpid/trunk'
[ https://svn.apache.org/r1590777 ]
QPID-5719: Fix unintentional dependency on qpid-ha for non-HA installations.
qpidd init script should not use qpid-ha if it is not installed.
> HA becomes unresponsive once any of the brokers are SIGSTOPed
> -------------------------------------------------------------
>
> Key: QPID-5719
> URL: https://issues.apache.org/jira/browse/QPID-5719
> Project: Qpid
> Issue Type: Bug
> Components: C++ Clustering
> Affects Versions: 0.28
> Reporter: Alan Conway
> Assignee: Alan Conway
> Attachments: ha-heartbeat.diff
>
>
> See also: https://bugzilla.redhat.com/show_bug.cgi?id=1086638
> Description of problem:
> qpid HA becomes unresponsive once any of the brokers are SIGSTOPed.
> There are three different cases:
> a] stopped ALL brokers
> b] stopped the primary
> c] stopped a backup
> In any of above listed cases following observations were made:
> a-c] RHCS clustat is just fine and report everything is just ok
> a-c] qpid-ha (status --all) hangs
> a,b,c*] any other clients are indefinitely blocked
> a-b] cases directly at the beginning
> c] case at the end, client able to recover after minute or so,
> due to connection timeout
> In fact this defect also proves that qpid-ha can be out of sync when compared
> to clustat as tracked by BZ.
> The expectations are:
> * a] quorum lost HA down (same as kill -9 to all nodes)
> no clients able to communicate
> * b] promotion of new primary, there has to be mechanism to get rid of
> stopped process
> clients should be able to communicate after recovery
> * c] unresponsive backup should get restarted
> clients should be able to communicate after duration when backup is
> detected as unresponsive
> * Generally better integration Qpid HA environment <-> RHCS is needed
> aka SIGSTOP detection
> * Heartbeat primary <-> backups probably needed
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]