Jerome Blanchard created UNOMI-874: -------------------------------------- Summary: Cluster node config is empty Key: UNOMI-874 URL: https://issues.apache.org/jira/browse/UNOMI-874 Project: Apache Unomi Issue Type: Improvement Reporter: Jerome Blanchard
We faced a recurring (but flaky) problem in the clustered version of UNOMI : Sometimes, one of ClusterNode contains a null configuration when queried throught /cxs/cluster, thus publisHostAddress or internalHostAddress are null and imply to takes into consideration that option when trying to reach cluster node from client side. More than that, that node is not reachable because of unexposed address. It may be linked to a Cellar configuration replication bug that cause one of the nodes to have that configuration problem : [https://issues.apache.org/jira/projects/KARAF/issues/KARAF-7861?filter=allopenissues&orderby=created+DESC%2C+priority+DESC%2C+updated+DESC] I think the replication problem occurs in ClusterServiceImpl.init() : [https://github.com/apache/unomi/blob/81989bd816f49337d33171541a24daaef0856221/services/src/main/java/org/apache/unomi/services/impl/cluster/ClusterServiceImpl.java#L191|https://github.com/apache/unomi/blob/81989bd816f49337d33171541a24daaef0856221/services/src/main/java/org/apache/unomi/services/impl/cluster/ClusterServiceImpl.java#L155] If any other node is doing the same init() phase at the same time, cellar bug occurs and make one of the config to be overridden by the other, causing a node to exists in the karaf cluster but not having a config exposed. When nodes are then listed in the getClusterNodes(), the global config for the publicURL (which is a combined string of all nodes publicURLs serparated by a ',') does not find it for a node : [https://github.com/apache/unomi/blob/81989bd816f49337d33171541a24daaef0856221/services/src/main/java/org/apache/unomi/services/impl/cluster/ClusterServiceImpl.java#L191] I proposed a patch for Karaf Cellar (in the Jahia fork) but for version 4.1.3 and UNOMI rely on cellar 4.2.1.: [https://github.com/Jahia/karaf-cellar/commit/76ecb6b1993bfa0e9124ac8437fcfdd87249d048] Maybe backporting the fix could be an option... -- This message was sent by Atlassian Jira (v8.20.10#820010)