Thomas Draier created KARAF-5562:
------------------------------------

             Summary: Improve cellar groups configuration from hazelcast
                 Key: KARAF-5562
                 URL: https://issues.apache.org/jira/browse/KARAF-5562
             Project: Karaf
          Issue Type: Improvement
          Components: cellar-hazelcast
    Affects Versions: 4.1.4, 4.0.10
            Reporter: Thomas Draier


We encountered different issues due to HazelcastGroupManager, I'm grouping them 
here as all of them are linked and we fixed them in a single refactoring of the 
class. This globally result in a better synchronization of the cellar groups 
configuration.

- Hazelcast network splits can result in very bad behaviour on the “groups” 
shared map - this map contains the list of groups and its members, and the 
system fully rely on it to know in which groups you are. If multiple nodes 
updates the map while they are not connected together (easy to reproduce by 
starting both nodes at the same time), and then join afterwards, the default 
merge algorithm is applied and simply overwrite the full map. This basically 
result in groups loosing members, even if the configuration file claims that 
the nodes are still members. 

- When handling the groups configuration, HazelcastGroupManager replicates the 
felix.fileinstall.filename property on each node, containing the configuration 
file path. It’s quite “ok” if you’re on a cluster with each node installed on 
the exact same path - however if you’re on the same machine, with 2 nodes on 
different paths : one node will at one point write on the config file of the 
other node and never updates its own config, which can be quite confusing.

- The HazelcastGroupManager can start even when a configuration is not detected 
by fileinstall yet - it then creates a new config, based on the hazelcast 
shared config, which will override the config file when fileinstall detects it. 
It does not have a huge impact, but it shuffles the properties files and makes 
it unreadable. 

- The updates from hazelcast to local config trigger back update on hazelcast 
which goes back to local config and sometimes revert the changes, resulting in 
no change in the config. Basically , when adding a group, a lot of properties 
are updated - for each of them we trigger a configuration update. Each 
configuration update triggers an event which send the whole config back to 
hazelcast, including properties that are not updated yet, setting them back to 
their old values. All events (hazelcast updates and osgi config) are treated 
asynchronously - depending on the orders of events, some properties can be 
reverted or never added (usually groups property is always reverted after a 
group add). 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to