[jira] [Commented] (KARAF-7861) Configuration replication missed due to race condition in cellar

Kevan Jahanshahi (Jira) Wed, 11 Sep 2024 00:31:04 -0700


    [ 
https://issues.apache.org/jira/browse/KARAF-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880885#comment-17880885
 ]


Kevan Jahanshahi commented on KARAF-7861:
-----------------------------------------

Thanks Jerome for the detailled explaination of the problem, I was browsing 
open ticket to see if something was related to this and I found this one: 
https://issues.apache.org/jira/browse/KARAF-5969

Which seem's to go in the direction of your option 2 (using fully the 
replicated map and listeners on the map directly.)

> Configuration replication missed due to race condition in cellar
> ----------------------------------------------------------------
>
>                 Key: KARAF-7861
>                 URL: https://issues.apache.org/jira/browse/KARAF-7861
>             Project: Karaf
>          Issue Type: Bug
>          Components: cellar
>         Environment: Karaf using cellar in a clustered environment to 
> replicated configuration updates.
>            Reporter: Jerome Blanchard
>            Priority: Major
>
> In a karaf cluster using cellar and more specifically cellar-config, updates 
> of a configuration on a node is not replicated to another node.
> Investigations are pointing a race condition where one node receives the 
> ClusterConfigurationEvent before the ReplicatedMap is effectively replicated 
> on the impacted node. Thus, the node does not store the configuration and the 
> local version keep staled.
> The race condition starts here :
> [https://github.com/Jahia/karaf-cellar/blob/47b6984217953a5263f7e1e0da040f488cef3a3e/config/src/main/java/org/apache/karaf/cellar/config/LocalConfigurationListener.java#L119-L127]
> and continues on another node here :
> [https://github.com/Jahia/karaf-cellar/blob/cellar-4.1.3-jahia-fixes/config/src/main/java/org/apache/karaf/cellar/config/ConfigurationEventHandler.java]
> Cellar is using a ReplicatedMap (hazelcast) to propagate configurations 
> accross cluster and the replication operation is asynchronous. Thus, if the 
> ClusterConfigurationEvent is received before the replication finish on the 
> target node, nothing happens and no error is dedected nor retry.
> To reproduce the problem we can use breakpoints (thread ones) :
>  * First one to simulate a long replicate operation by adding a breakpoint on 
> the emitting node in the class  
> *com.hazelcast.replicatedmap.impl.operation.ReplicateUpdateOperation.run()*
>  * Second one in cellar event listener that apply the replicated 
> configuration : 
> *org.apache.karaf.cellar.config.ConfigurationEventHandler.handle()* at line:  
> if (!equals(clusterDictionary, localDictionary) && 
> canDistributeConfig(localDictionary)) {
> Now you update a copnfiguration on the first node. On the target node, we can 
> see that the configuration is not updated we the event is received.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KARAF-7861) Configuration replication missed due to race condition in cellar

Reply via email to