[jira] [Work logged] (ARTEMIS-2716) Implements pluggable Quorum Vote

ASF GitHub Bot (Jira) Thu, 27 May 2021 03:51:06 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-2716?focusedWorklogId=602870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602870
 ]


ASF GitHub Bot logged work on ARTEMIS-2716:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/May/21 10:50
            Start Date: 27/May/21 10:50
    Worklog Time Spent: 10m 
      Work Description: franz1981 edited a comment on pull request #3555:
URL: https://github.com/apache/activemq-artemis/pull/3555#issuecomment-849440671


   Update on 
   > why backup hasn't committed suicide given that it's not able to start as a 
live?
   
   Assuming a correct `Atomix` behaviour It seems related to a bug on my side:
   ```java
      private void startAsLive(final DistributedLock liveLock) throws Exception 
{
            // ...
            // IMPORTANT:
            // we're setting this activation JUST because it would allow the 
server to use its
            // getActivationChannelHandler to handle replication
            final ReplicationPrimaryActivation primaryActivation = new 
ReplicationPrimaryActivation(activeMQServer, distributedManager, 
policy.getLivePolicy());
            liveLock.addListener(primaryActivation);
            activeMQServer.setActivation(primaryActivation);
            activeMQServer.initialisePart2(false);
            final boolean stillLive;
            try {
               stillLive = liveLock.isHeldByCaller();
            } catch (UnavailableStateException e) {
               LOGGER.warn(e);
               throw new ActiveMQIllegalStateException("This server cannot 
check its role as a live: activation is failed");
            }
            if (!stillLive) {
               throw new ActiveMQIllegalStateException("This server is not live 
anymore: activation is failed");
            }
            // ...    
   ```
   If the quorum is lost before `liveLock.addListener(primaryActivation)`, the 
current `AtomixDistributedLock::onStateChanged`:
   ```java
      private void onStateChanged(PrimitiveState state) {
         LOGGER.info(state);
         switch (state) {
            case SUSPENDED:
            case EXPIRED:
            case CLOSED:
               for (LockListener listener : listeners) {
                  listener.stateChanged(LockListener.EventType.UNAVAILABLE);
               }
               break;
         }
      }
   ```  
   It's going to find empty `listeners` and 
`ReplicationPrimaryActivation::stateChanged` won't be called to async stop the 
server.
   The late check `liveLock.isHeldByCaller` should fail and throw an exception, 
but it won't cause the server to stop.
   
   In short, the issue is that `liveLock.isHeldByCaller` must be able to stop 
the server: on primary activation this seems to happen due to 
https://issues.apache.org/jira/browse/ARTEMIS-388's 
`activeMQServer.callActivationFailureListeners(e)` that's not used on backup 
activation.
   
   I'm investigating why `activeMQServer.callActivationFailureListeners(e);` 
isn't used in any of the existing backup activations (shared nothing, shared 
store...) and if it won't be a viable option I'm going to call 
`AtomixDistributedLock::onStateChanged` in case of a lost lock, before throwing 
the exception, to async stop server.
   This same timing issue could happen with Zookeeper too, so it worth to be 
fixed.
   
   Re the Atomix issue instead, I'm going to investigate a bit more what's 
going on, because it doesn't seem that `AtomixDistributedLock::onStateChanged` 
has been called while ` liveLock.isHeldByCaller` has failed, that seems a bug.
   
   
   
     
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 602870)
    Time Spent: 6h  (was: 5h 50m)

> Implements pluggable Quorum Vote
> --------------------------------
>
>                 Key: ARTEMIS-2716
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2716
>             Project: ActiveMQ Artemis
>          Issue Type: New Feature
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>         Attachments: backup.png, primary.png
>
>          Time Spent: 6h
>  Remaining Estimate: 0h
>
> This task aim to ideliver a new Quorum Vote mechanism for artemis with the 
> objectives:
> # to make it pluggable
> # to cleanly separate the election phase and the cluster member states
> # to simplify most common setups in both amount of configuration and 
> requirements (eg "witness" nodes could be implemented to support single 
> master-slave pairs)
> Post-actions to help people adopt it, but need to be thought upfront:
> # a clean upgrade path for current HA replication users
> # deprecate or integrate the current HA replication into the new version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-2716) Implements pluggable Quorum Vote

Reply via email to