[
https://issues.apache.org/jira/browse/KARAF-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880885#comment-17880885
]
Kevan Jahanshahi commented on KARAF-7861:
-----------------------------------------
Thanks Jerome for the detailled explaination of the problem, I was browsing
open ticket to see if something was related to this and I found this one:
https://issues.apache.org/jira/browse/KARAF-5969
Which seem's to go in the direction of your option 2 (using fully the
replicated map and listeners on the map directly.)
> Configuration replication missed due to race condition in cellar
> ----------------------------------------------------------------
>
> Key: KARAF-7861
> URL: https://issues.apache.org/jira/browse/KARAF-7861
> Project: Karaf
> Issue Type: Bug
> Components: cellar
> Environment: Karaf using cellar in a clustered environment to
> replicated configuration updates.
> Reporter: Jerome Blanchard
> Priority: Major
>
> In a karaf cluster using cellar and more specifically cellar-config, updates
> of a configuration on a node is not replicated to another node.
> Investigations are pointing a race condition where one node receives the
> ClusterConfigurationEvent before the ReplicatedMap is effectively replicated
> on the impacted node. Thus, the node does not store the configuration and the
> local version keep staled.
> The race condition starts here :
> [https://github.com/Jahia/karaf-cellar/blob/47b6984217953a5263f7e1e0da040f488cef3a3e/config/src/main/java/org/apache/karaf/cellar/config/LocalConfigurationListener.java#L119-L127]
> and continues on another node here :
> [https://github.com/Jahia/karaf-cellar/blob/cellar-4.1.3-jahia-fixes/config/src/main/java/org/apache/karaf/cellar/config/ConfigurationEventHandler.java]
> Cellar is using a ReplicatedMap (hazelcast) to propagate configurations
> accross cluster and the replication operation is asynchronous. Thus, if the
> ClusterConfigurationEvent is received before the replication finish on the
> target node, nothing happens and no error is dedected nor retry.
> To reproduce the problem we can use breakpoints (thread ones) :
> * First one to simulate a long replicate operation by adding a breakpoint on
> the emitting node in the class
> *com.hazelcast.replicatedmap.impl.operation.ReplicateUpdateOperation.run()*
> * Second one in cellar event listener that apply the replicated
> configuration :
> *org.apache.karaf.cellar.config.ConfigurationEventHandler.handle()* at line:
> if (!equals(clusterDictionary, localDictionary) &&
> canDistributeConfig(localDictionary)) {
> Now you update a copnfiguration on the first node. On the target node, we can
> see that the configuration is not updated we the event is received.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)