Kurt Westerfeld created KARAF-6224:
--------------------------------------
Summary: Race condition in BaseActivator on first launch
Key: KARAF-6224
URL: https://issues.apache.org/jira/browse/KARAF-6224
Project: Karaf
Issue Type: Bug
Components: karaf
Affects Versions: 4.2.4, 4.1.7, 4.0.10
Reporter: Kurt Westerfeld
We have several karaf containers we run on single machine that contains a large
number of cores (20). The machine core count is high so this may be a hard
problem to reproduce. We have customized the RMI and JMX ports for each of the
containers so that they do not conflict. However, after the first karaf VM is
launched and claims ports 1099/44444, the second VM will attempt to do the same
briefly before its customized configuration can be read from the ${karaf.etc}
directory. You can see that the management bundle gets started and then a
configuration update will happen immediately with the corrected values.
In looking over BaseActivator, it seems that a thread is created to dispatch
the initialization and sometimes this thread will encounter a null field
"config" before the asynchronous managed service event arrives. In this case,
the configuration is missing and defaults will be used. Because of this, ports
1099 and 44444 are temporarily attempted to be used until the first managed
service event arrives with the updated() method. Immediately after that, the
service reconfigures and uses the proper customized values.
This is a problem for us because at times this temporary event can cause a
client to mistakenly connect to the wrong container. We use JMX over RMI to
perform a number of management operations and this initial startup is
unreliable. Our three karaf containers have some interdependencies that this
temporary condition is causing problems with.
This problem does not occur as often on subsequent restarts, which means that
initial provisioning of the ${karaf.etc} must be racing here. We have seen it
happen, however, although rarer, at any time. It is believed that the high
core count of the server this happens to be running on results in the race
condition.
Suggested fix is to make a call to config admin at run() to read the
configuration if this.config is null. This would handle the race here but it
could cause other bad interactions with config admin? Not sure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)