[
https://issues.apache.org/jira/browse/KARAF-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530029#comment-16530029
]
Matthew Zipay commented on KARAF-5798:
--------------------------------------
I've looked at org.apache.karaf.main.Main and
org.apache.karaf.main.InstanceHelper and I believe I've found the "culprit" -
The _karaf.pid_ file does not get written until
{{KarafLockCallback.lockAcquired}} calls {{InstanceHelper.setupShutdown}},
which in turn calls {{InstanceHelper.writePid}}. This confirms my observations.
But this seems nonsensical, as the pid is known well in advance.
({{Main.launch}} actually writes the pid almost immediately to the
_instance.properties_ file (via {{InstanceHelper.updateInstancePid}}), so I
can't see any reason why writing _karaf.pid_ would have to wait until lock
acquisition. (A
[search|https://github.com/apache/karaf/search?q=pidFile+in%3Afile+language%3Ajava]
for "pidFile" in Kara'fs GitHub repository yields nothing to suggest that
{{ConfigProperties.pidFile}} is used in any way to influence master/slave
locking behavior.)
On a side note, is there a reason why {{InstanceHelper.writePid}} does not use
{{InstanceHelper.getPid}}? They both obtain the pid by extracting it from
{{java.lang.management.RuntimeMXBean.getName}}, but the former uses a regex
while the latter uses substring & indexOf('@'). I think {{getPid}} should
decide on one way to extract the pid and then be called from both
{{updateInstancePid}} and {{writePid}}.
> Karaf slave instance does not write pid or port file until it becomes master
> ----------------------------------------------------------------------------
>
> Key: KARAF-5798
> URL: https://issues.apache.org/jira/browse/KARAF-5798
> Project: Karaf
> Issue Type: Bug
> Components: karaf-boot
> Affects Versions: 4.0.9
> Reporter: Matthew Zipay
> Priority: Major
>
> In a Karaf master/slave environment, the slave process does not write its pid
> or port file until it acquires the lock and becomes the master.
> I am running Karaf 4.0.9 (ServiceMix 7.0.1).
> Karaf is configured as master/slave using the following from
> system.properties. Master and slave are on different physical nodes.
> {code:java}
> karaf.lock=true
> karaf.lock.class=org.apache.karaf.main.lock.OracleJDBCLock
> karaf.lock.level=79
> karaf.lock.delay=10000
> karaf.lock.jdbc.url=jdbc:oracle:thin:#REMOVED#
> karaf.lock.jdbc.driver=oracle.jdbc.driver.OracleDriver
> karaf.lock.jdbc.user=#REMOVED#
> karaf.lock.jdbc.password=#REMOVED#
> karaf.lock.jdbc.table=KARAF_LOCK
> karaf.lock.jdbc.clustername=karaf
> karaf.lock.jdbc.timeout=30
> karaf.lock.slave.block=false
> {code}
> Attempting to stop the slave Karaf process results in _"Can't connect to the
> container. The container is not running."_ This is not true, as a simple {{ps
> -ef | grep karaf}} confirms that it is in fact running. I am able to enter
> the Karaf shell just fine, use the web console, etc.
> I have confirmed through multiple tests that the pid and port files don't get
> written until the master lock is acquired.
> Steps:
> # With the Karaf slave node not started, note the pid and port files do not
> exist (or contain outdated values from a previous process).
> # Start the Karaf slave process.
> # Note that the pid and port files have not been written.
> # Stop the master process.
> # Observe the slave process acquire the lock and become master.
> # Note that the pid and port files have now been written.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)