Jerome Blanchard created UNOMI-874:
--------------------------------------

             Summary: Cluster node config is empty
                 Key: UNOMI-874
                 URL: https://issues.apache.org/jira/browse/UNOMI-874
             Project: Apache Unomi
          Issue Type: Improvement
            Reporter: Jerome Blanchard


We faced a recurring (but flaky) problem in the clustered version of UNOMI :

Sometimes, one of ClusterNode contains a null configuration when queried 
throught /cxs/cluster, thus publisHostAddress or  internalHostAddress  are null 
and imply to takes into consideration that option when trying to reach cluster 
node from client side. More than that, that node is not reachable because of 
unexposed address.

It may be linked to a Cellar configuration replication bug that cause one of 
the nodes to have that configuration problem :

[https://issues.apache.org/jira/projects/KARAF/issues/KARAF-7861?filter=allopenissues&orderby=created+DESC%2C+priority+DESC%2C+updated+DESC]

I think the replication problem occurs in ClusterServiceImpl.init() :

[https://github.com/apache/unomi/blob/81989bd816f49337d33171541a24daaef0856221/services/src/main/java/org/apache/unomi/services/impl/cluster/ClusterServiceImpl.java#L191|https://github.com/apache/unomi/blob/81989bd816f49337d33171541a24daaef0856221/services/src/main/java/org/apache/unomi/services/impl/cluster/ClusterServiceImpl.java#L155]

If any other node is doing the same init() phase at the same time, cellar bug 
occurs and make one of the config to be overridden by the other, causing a node 
to exists in the karaf cluster but not having a config exposed.

When nodes are then listed in the getClusterNodes(), the global config for the 
publicURL (which is a combined string of all nodes publicURLs serparated by a 
',') does not find it for a node :

[https://github.com/apache/unomi/blob/81989bd816f49337d33171541a24daaef0856221/services/src/main/java/org/apache/unomi/services/impl/cluster/ClusterServiceImpl.java#L191]

I proposed a patch for Karaf Cellar (in the Jahia fork) but for version 4.1.3 
and UNOMI rely on cellar 4.2.1.:

[https://github.com/Jahia/karaf-cellar/commit/76ecb6b1993bfa0e9124ac8437fcfdd87249d048]

Maybe backporting the fix could be an option...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to