[jira] [Commented] (ARTEMIS-2941) Improve JDBC HA connection resiliency

ASF subversion and git services (Jira) Wed, 28 Oct 2020 05:22:24 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222125#comment-17222125
 ]


ASF subversion and git services commented on ARTEMIS-2941:
----------------------------------------------------------

Commit 647151b0aff8f1245735bfbc6e8d22d1cdee0afb in activemq-artemis's branch 
refs/heads/master from gtully
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=647151b ]

ARTEMIS-2941 - renew tasks are nearly always a little late, make this test more 
tolerant of that


> Improve JDBC HA connection resiliency
> -------------------------------------
>
>                 Key: ARTEMIS-2941
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2941
>             Project: ActiveMQ Artemis
>          Issue Type: Improvement
>          Components: Broker
>    Affects Versions: 2.15.0
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is aiming to replace the restart enhancement feature of 
> https://issues.apache.org/jira/browse/ARTEMIS-2918 because this last one is 
> too dangerous due to the numerous potential leaks that a server in production 
> could hit by allowing it to restart while keeping the Java process around. 
> Currently, JDBC HA uses an expiration time on locks that mark the time by 
> which a server instance is allowed to keep a specific role, dependent by the 
> owned lock (live or backup).
> Right now, the first failed attempt to renew such expiration time force a 
> broker to shutdown immediately, while it could be more "relaxed" and just 
> keep retry until the very end ie when the expiration time is approaching to 
> end.
>  
> The only concern of this feature is related to the relation between the 
> broker wall-clock time and the DBMS one, that's used to set the expiration 
> time and that should be within certain margins.
> For this last part I'm aware that classic ActiveMQ lease locks use some 
> configuration parameter to set the magnitude of the allowed difference (and 
> to compute some base offset too).
>  
> Right now this feature seems more risk-free and appealing then  
> https://issues.apache.org/jira/browse/ARTEMIS-2918, given it narrows the 
> scope of it to what's the very core issue ie a more resilient behaviour on 
> JDBC lost connectivity.
>  
> To understand the implications of such change, consider a shared store HA 
> pair with configured 60 seconds of expiration time:
>  # DBMS goes down
>  # an in-flight persistent operation on the live data store cause the live 
> broker to kill itself immediately, because no reliable storage is connected
>  # backup is unable to renew its backup lease lock
>  # DBMS goes up in time, before the backup lock local expiration time is ended
>  # backup is able to renew its backup lease lock and retrieve the very last 
> state of live (that was live) and, if no script is configured to restart the 
> live, to failover and take its role
>  # backup is now live and able to serve clients
>  
>  
> There are 2 legit questions re potential improvements on this:
>  # why the live cannot keep re-trying I/O (on the journal, paging or large 
> messages) until its local expiration time end? 
>  # why the live isn't just returning back an I/O error to the clients?
>  
> The former is complex: the main problem I see is from the resource 
> utilization point of view; keeping an accumulating backlog of pending 
> requests, blocked awaiting the last one for an arbitrary long time will 
> probably cause the broker memory to blown up, to not mention that clients 
> will timed out too.
> The latter seems more appealing, because will allow clients to fail fast, but 
> it would affect the current semantic we use on the broker storage operations 
> and I need more investigation to understand how to implement it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2941) Improve JDBC HA connection resiliency

Reply via email to